1. MARC and BIBFRAME
Linked Data for Libraries
Trinity College Dublin
6 November 2014
Thomas Meehan
tom@aurochs.org
@orangeaurochs
2. Card Index Catalogue
http://cardcat.ucl.ac.uk/cgi-bin/carddisplay.pl?card=887;drawer=13;max=931;ctype=C
3. AACR2 in MARC21
245 00 $a Models for decision :
$b a conference under the auspices of the United Kingdom
Automation Council organised by the British Computer
Society and the Operational Research Society /
$c edited by C.M. Berners-Lee.
260 __ $a London :
$b English Universities Press,
$c 1965.
300 __ $a x, 149 p. :
$b ill. ;
$c 23 cm.
504 __ $a Includes bibliographical references.
700 1_ $a Berners-Lee, C. M.
5. AARC2 in .mrc
00788nam a2200181 a
450000100270000000500170002700800410004402400150008524502100010026000
490031030000320035950400410039165000330043270000230046571000390048871
0003000527710004900557_UCL01000000000000000477125_20061112120300.0_85
0710s1965 enka b 000 0 eng _8 _ax280050495_00_aModels for decision :_ba
conference under the auspices of the United Kingdom Automation Council organised
by the British Computer Society and the Operational Research Society /_cedited by
C.M. Berners-Lee._ _aLondon :_bEnglish Universities Press,_c1965._ _ax, 149 p.
:_bill. ;_c23 cm._ _aIncludes bibliographical references._ 0_aDecision
making_vCongresses._1 _aBerners-Lee, C. M._2 _aUnited Kingdom Automation
Council._2 _aBritish Computer Society._2 _aOperational Research Society (Great
Britain)__
Leader
Directory
Data
245 field, final 710 field
6. Finite Notation Problem
Too many subject schemes
650 _0 for LCSH
650 _1 for LC for children's literature
650 _2 for MeSH
…
650 _7 Source specified in subfield $2
Not enough indicators
246 184 $aThe title on the spine
7. Data in More Than One Place
Languages
008 (positions 35-37) eng
041 __ $a eng
240 10 $l English
546 __ $a In English.
12. Data Mixed Up
GMD
245 10 $a Rules on the web
$h [electronic resource] :
$b research and applications /
$c Antonis Bikakis, Adrian Giurca (eds.).
245 10 $a Rules on the web
$b research and applications /
$c Antonis Bikakis, Adrian Giurca (eds.).
Nothing allowed after 245$c
245 10 $a Enduring resistance :
$b cultural theory after Derrida /
$c edited by Sjef Houppermans, Rico Sneller, Peter van Zilfhout. =
La résistance persérvère : la théorie de la culture (d')aprés Derrida / edité par Sjef
Houppermans, Rico Sneller, Peter van Zilfhout.
13. Changing Text as Primary Key for
Headings and Authorities
Author heading for deceased person
Niemeyer, Oscar, 1907-
Different preferences for writing name
Mao, Tse-tung, 1893-1976 [Former heading]
Mao, Zedong, 1893-1976
毛泽东, 1893-1976
Small differences could break match
Mao, Zedong, 1893-1976.
Mao, Zedong, 1893-1976
15. Record Not Data
00788nam a2200181 a
450000100270000000500170002700800410004402400150008524502100010026000
490031030000320035950400410039165000330043270000230046571000390048871
0003000527710004900557_UCL01000000000000000477125_20061112120300.0_85
0710s1965 enka b 000 0 eng _8 _ax280050495_00_aModels for decision :_ba
conference under the auspices of the United Kingdom Automation Council organised
by the British Computer Society and the Operational Research Society /_cedited by
C.M. Berners-Lee._ _aLondon :_bEnglish Universities Press,_c1965._ _ax, 149 p.
:_bill. ;_c23 cm._ _aIncludes bibliographical references._ 0_aDecision
making_vCongresses._1 _aBerners-Lee, C. M._2 _aUnited Kingdom Automation
Council._2 _aBritish Computer Society._2 _aOperational Research Society (Great
Britain)__
Leader
Directory
Data
245 field, final 710 field
17. Other Considerations
Only libraries use MARC
– Libraries tied to library-specific software/processes
– Outside agencies can’t take advantage of library data
and standards (See Also: RDA not freely available)
Not even all of libraries use MARC
– Archives
– Repositories
– Non-MARC LMSs
19. Selected BIBFRAME Timeline
2002. October. MARC must die (Roy Tennant)
2006. TBL's Linked Data: Design Issues.
2008. January. On the Record (LC report). Suggested that MARC was no longer "fit for purpose"
2011. June. Report and Recommendations of the U.S. RDA Test Coordinating Committee.
Reported that
"Most felt any benefits of RDA would be largely unrealized in a MARC environment".
2011. October. A Bibliographic Framework for the Digital Age (LC report). LC concludes from the
two reports above to commit to a new "new bibliographic framework".
2012. May. LC begins work with Zepheira on model.
2014. Stabilization of model and experimentation. Still being developed (and argued over)
34. Changes in Cataloguing III
ISBD
|
AACR2
|
MARC
ISBD/FRBR/ISBRBR
|
AACR2/RDA/AACRDA
|
BIBFRAME/sche
ma.org/bibo/dct
/foaf/owl/rdf/rdfs/madsrdf/
edm/rdau/rdaw/rdae/rdam/rdaa/rdac/frbrer/void/blterms/
isbd/skos/wgs84pos/etc
35. Finding Out More
General Information
BIBFRAME website. http://www.loc.gov/bibframe/
BIBFRAME email list.
http://listserv.loc.gov/listarch/bibframe.html
Some Sample Criticism
Greenall, Rurik. Brinxmat's Blog. http://brinxmat.wordpress.com/?s=bibframe
Sanderson, Rob. Differences between BibFrame and other Linked Open Data
Approaches.
https://docs.google.com/document/d/1yyVKeYQkBucZqSoQ2qY17vrER46-
S6Tw6lY8uqA5xxQ/edit
Pohl, Adrian. Name Authority Files and Linked Data.
http://www.uebertext.org/2014/07/name-authority-files-linked-data.html
36. MARC and BIBFRAME
Linked Data for Libraries
Trinity College Dublin
6 November 2014
Thomas Meehan
tom@aurochs.org
@orangeaurochs
Editor's Notes
First, a brief discussion of MARC and why there is felt to be a need to replace it.
Second, a introduction to Bibframe, what it looks like and the kind of issues that surround it.
When being critical of MARC, it's wise and interesting to bear in mind that many of its faults actually stem from the cataloguing rules they encode: the way they are so closely bound up together creates many of the problems.
In the beginning was the index card. This is basically text, designed to be brief but readable by people. The elements are separated by punctuation.
MARC was created to essentially mount that index card onto a computer and share it. It is ideally placed to recreate the index card's data for human consumption.
This could be for printing out more index cards, compiling a dictionary catalogue, or even a display screen for an OPAC, at least if you want to recreate the index card layout.
This is the same thing with an RDA record. Although RDA is in theory free of ISBD, MARC21 carries it with it like an albatross round its neck.
In many ways, even this is in reality an abstraction of MARC…
This is what MARC actually looks like! Most of what we actually see of MARC is an abstraction for display or editing. If RDF/XML or JSON-LD looks confusing, it's worth bearing that in mind. And whereas RDF serializations are actually human readable if not easy to do so, a raw MARC record needs more superhuman capabilities.
To get any bit of information out, the data needs to pulled apart first.
At the top is the Leader, which includes some basic information about the record.
Next is the Directory, which has information about the fields and their length.
The third part is the data itself: note the MARC fields are all absent, as are the subfield markers. There are markers there but they are hidden! There is one invisible code to end the record, another to end each field, and a third to mark the beginning of subfields. In the above example, I've replaced all the hidden codes with underscores so you get an idea what's there.
If you are familiar with the deficiences of Dewey, which is constrained by its notation, this will ring a bell.
Answer: SuperMARC!
Admittedly, contexts can vary here, but do note the variety of methods for formatting the content too: codes, controlled full-language, language in free text.
A particularly MARC21 problem:
First, ISBD punctuation only
Second, MARC only
Both of these make sense on their own. However we put them back together!
Third, MARC and ISBD. This makes human editing harder than it needs to be and automatic processing a real headache.
Many MARC data elements contain text. Especially when isolated they become harder to make sense of
In the ISBN examples, qualifiers are included in the subfield. With the new subfield $q, this is at least partially solved…
…as this data is moved elsewhere \o/ … although
… the pricing and availability information still needs punctuation in front of it. 8-(
The title and place of publication both have extraneous punctuation after them too. This is important as the place of publication is not Köln :, it is Köln, or possibly Cologne, or the place we call Cologne, but the Germans call Köln (unless you speak the local dialect: Kölle) but the Romans called Colonia Claudia Ara Agrippinensium, and so on.
The extent example shows the punctuation for an element divorced from the element it represents. This has to be cleaned out before any processing as to its meaning can take place. Even then, the units in this and the dimensions examples are mixed in with the numbers and can be very variable.
In the first example, some data about the media type is stuck in the middle of the title!
Whatever the benefits of the GMD, this is not a good place to put it.
If we try to take out that one data element we end up with the second example, which is no good either!
In the third example, a lot of fine grained data is essentially dumped in a single box and it would be very hard to retrieve it. Clearly in this case, the record is designed only to provide display.
In the first example, poor Mr Niemeyer has passed away, so the heading changes. This is not helpful for linking or matching. Not to mention the huge maintenance issues.
In the second case, preferences changes, or different communities might want to display different strings to users.
The third example shows the kind of small textual issues that can break matching. LMS's indexing these have to process these to figure out they match. I had exactly this problem using Cambridge linked data: I searched under the string.
Both these 700s express a relationship of some kind. But what?
RDA, and the apparently enthusiastic adoption of relationship designators, have at least done something to address this with the $e.
(Although note the weird comma between the heading and the relationship designator!): this is similar to the previous problem with the GMD
To do anything, you have to take the MARC file to bits.
One piece of information will not stand alone.
MARC is record based, and that record is a Manifestation record.
However, Expression level elements are spread throughout (in red).
Work and Expression level elements are implied and not explicit (e.g. title).
There are internal relationships here: we know Berners-Lee is an expression level relationship merely by the relationship: the 700 and the $e otherwise give nothing away.
Think of the language example given above: machine-readable expression information is all over the place.
"Many survey respondents expressed doubt that RDA changes would yield significant benefits without a change to the underlying MARC carrier.
"Most felt any benefits of RDA would be largely unrealized in a MARC environment. MARC may hinder the separation of elements and ability to use URIs in a linked data environment.
"While the Coordinating Committee tried to gather RDA records produced in schemas other than MARC, very few records were received."
"Demonstrate credible progress towards a replacement for MARC"
-- Report and Recommendations of the U.S. RDA Test Coordinating Committee. Executive Summary
e.g. Primo converts all data, even MARC, into something else before processing.
Twelve years ago, Roy Tennant addressed many of these issues, recognising in particular that AACR2 and MARC are so closely entwined and focussed on the card catalogue.
- Unreadably esoteric format
Granularity. E.g. authors' first and last names all in a string. Roles hidden within text in the title field!
Extensibility, e.g. adding contents notes or additional content
Clumsy handling of different scripts
Technical marginalisation, i.e. only libraries use MARC limits us to niche vendors
"With the advent of the web, XML, portable computing, and other technological advances, libraries can become flexible, responsive organizations that serve their users in exciting new ways. Or not. If libraries cling to outdated standards, they will find it increasingly difficult to serve their clients as they expect and deserve."
Twelve years later, not much has changed.
A very selective timeline of Bibframe developments. N.B. Zepheira are technically involved now but still involved in some related efforts: BIBFLOW (a UC Davis/Zepheira project to look at workflows in an RDA/Bibframe-style environment) – Libhub (a Zepheira initiative looking at increasing the visibility of libraries on the web: current partners: Denver Public Library, Multnomah County Library, German National Library).
Bibframe of course aims to be linked data in order to benefit from the all that means. A good way to approach this is to compare it to an existing effort. E.g. the British Library model.
For comparison this is a single triple from the British Library data model.
Reused existing vocabularies!
This is a theoretical example of Bibframe, imagining that we at UCL have a linked data server set up.
Again, re-used existing vocabularies, except where there was nothing to fit.
Again, links to external resources as well as giving the text of the person's name.
Notice how all the properties are BIBFRAME-specific. BIBFRAME is very like this, unusually so, arguably for reasons of security and control. Schema.org also is but is much less ambitious or complicated.
None of this is supported by library systems, and that is part of the point! MARC locks us into library-specific specialised software. Using linked data frees us, at least in theory. There is the danger with BIBFRAME being such an 'official' standard that this is what everyone will follow. Not necessarily a good thing.
This is the basic Bibframe model.
"BIBFRAME has worked on modelling works as Works within the BIBFRAME model, similar to the RDA modelling work, itself modelled on the work on the FRBR model of Works and Expressions. A BIBFRAME Work is a creative work, perhaps a FRBR Work, or an RDA FRBR Work but it also expresses a FRBR Expression, and of course an RDA FRBR Expression. A Work may express another Work based on others’ work, not just a FRBR Work or an RDA Work. That also works. FRBR Works or RDA Works expressed as BIBFRAME Works can relate to FRBR Expressions (BIBFRAME Works or RDA Expressions). So, Works are works that can be Works but also Expressions linked to Works that really are Works."
OCLC arguably have something similar. As do clustering discovery systems.
Where does cataloguing fit into all this?
When US national and major university libraries starting testing RDA, they found that MARC21 really couldn't handle it so made progress towards a replacement one of their conditions to implementing RDA.
The Library of Congress implemented the Bibliographic Framework Initiative (BIBFRAME). Aided by consultants Zepheira and some early experimenters (including the BL although no longer).
The Library of Congress implemented the Bibliographic Framework Initiative (BIBFRAME). Aided by consultants Zepheira and some early experimenters (including the BL although no longer).