Generating bibliographic records as linked data (LD) offers the opportunity for libraries to publish and interlink metadata on the semantic web (SW). This can expose library resources to a larger audience, increase the use of library materials, and allow for more efficient searches. The Digital Resources and Imaging Services (DRIS) department of the Library of Trinity College Dublin (TCD) hopes to move towards publishing their bibliographic records as LD and, therefore, requires a tool that allows for the creation of records in RDF - a model for representing and exchanging LD on the web as structured data.
Although libraries are publishing LD in increasing quantities there remains many barriers to librarians making full use of the SW, including that many tools used for generating LD are aimed at technical experts. This project explored a means of overcoming some of these barriers through the development a MODS-RDF cataloguing tool for use in the library domain. MODS is a highly flexible XML metadata schema that can be used to catalogue cultural heritage materials, and MODS-RDF is an expression of this schema in RDF.
A user-centred design approach, which focuses on designing an interface from the perspective of its users, was followed when developing the tool. As such, DRIS was involved in all stages of development, including requirements gathering, interface prototyping and design, and usability testing. The results of the first phase of usability testing indicated that many of the initial user requirements were met and that DRIS were interested in developing the interface further. These results are being used to inspire the second iteration of the tool. Ongoing usability testing will be conducted to ensure that the resulting interface meets DRIS’ unique needs.
By developing a tool that allows DRIS to produce MODS-RDF records, the library will be able to interlink with other LD resources. This could allow library users to access a web of related data from a single information search, making the research process more efficient and potentially inspiring new research through the linking of disparate collections.
MODS-RDF Cataloguing Tool for Trinity College Dublin Digital Resources
1. The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
Development of a MODS-RDF
Cataloguing Tool
for the Digital Resources and Imaging
Services of Trinity College Dublin
Lucy McKenna - ADAPT Marta Bustillo - UCD Library
Tim Keefe - DRIS, TCD Library
Christophe Debruyne - ADAPT Declan O’Sullivan - ADAPT
2. www.adaptcentre.ieDigital Resources and Imaging Services - DRIS
2
• Hosts the Digital Collections Repository of TCD
Library
• Open access to TCD’s collection of digitised
cultural heritage materials
• Aim to publish and interlink bibliographic
metadata on the semantic web
– Software challenges
• Inspired a collaboration between ADAPT &
DRIS
3. www.adaptcentre.ieSemantic Web & Linked Data
3
• Semantic Web (SW) - Web of data
– relationships between data are defined in a common
machine readable format
• Linked Data (LD) - best practices and guidelines
for publishing & interlinking datasets on the SW
– Use of HTTP URIs to identify entities
– Resource Description Framework (RDF)
• Model used to represent entities & capture their interlinks
• Triple statements - subject, predicate (relationship), object
4. www.adaptcentre.ieWhy engage with the SW?
• Sharing library metadata on the SW:
– Free metadata from institutional databases
– Improve visibility and use of resources
– Expose data to a larger audience
– Improve metadata sharing & quality
• Interlinking of resources across institutions:
– Improve search accuracy
– Efficient searching
4
5. www.adaptcentre.ieRole of the Librarian
• Crowdsourcing?
• LD from authoritative sources e.g. libraries
– Increased degree of credibility
– Used with increased frequency
• Librarians are experts in using authorities and
controlled vocabularies
– Consistent identification of similar entities on the SW
– Many already available as LD
– More detailed and accurate searches
5
6. www.adaptcentre.ieChallenges
• Limited LD tools for non-technical experts
• LD software not aimed at libraries
• OCLC Survey, 2015
– Investigated the use of LD in libraries
• Reported barriers:
– Few examples of useful library applications of LD
– Difficulties incorporating LD into existing workflows
– Difficulty establishing links
– Lack of authority control on the SW
6
7. www.adaptcentre.ieWhat is MODS?
7
• Metadata Object Description Schema
– XML metadata schema
– Bibliographic element set
– Describe cultural and bibliographic resources
– 20 top-level elements with associated subelements
& attributes
• MODS-RDF Ontology
– Represent bibliographic data as LD
• Digital Library Federation’s Aquifer Initiative
– Implementation guidelines
– Ensures quality & consistency of MODS records
– Interoperable metadata
8. www.adaptcentre.ieDesign Process
8
• User-centred approach
– Design from user perspective
– Collaborated with DRIS metadata-cataloguer
1. Requirements Gathering
– Interviewed DRIS metadata cataloguer
– Established a set of requirements for the tool
2. Interface Mock-Up
– Based on requirements
19. www.adaptcentre.ieNew Learning & Future Directions
19
• Collaboration with DRIS
– Understand the role of libraries in the
development of the SW
– Identified a need in the library domain
– Bespoke tool design
– Identified areas for future research
• Iterative design and testing of the interface with DRIS
• LD interlinking
23. Effective communication with
computer science researchers
Better understanding of DRIS’s own
cataloguing processes & needs
Importance of making our expertise
visible to other areas of the
University
What the team learned
23
Me – PhD, comp sci, ADAPT, comp sci research centre in TCD, DCU, DIT, UCD
Marta – college liaison librarian, UCD
was metadata cataloguer – Digital Collections, TCD
Digital Resources and Imaging Services
Digital Collections Repository of TCD - open access to the university’s growing collection of digitised cultural heritage materials – books, maps, drawings, paintings, manuscripts
DRIS hopes to move towards publishing their bibliographic data as LD.
- easier interlinking with other cultural heritage linked datasets & increase visibility
Current software system setup is unsuitable for the purpose of publishing LD and that other existing software does not meet their specific needs.
The Semantic Web can be considered as a Web of data, where the relationships between data are defined in a common machine readable format
Linked Data (LD) refers to a set of best practices and guidelines for publishing and interlinking datasets on the SW.
include the use of HTTP URIs to identify and retrieve entities
Using the Resource Description Framework (RDF) to represent these entities, and to capture their interlinks which may point to descriptions in other LD datasets.
An RDF statement takes the form of a triple, which consists of a subject, predicate and object. RDF requires that URIs are used to identify subjects and relationships/predicates,
the resulting data allows for both human and computer based agents to crawl, explore and discover things on the Semantic Web (SW).
Freeing metadata from institutional databases and sharing it on the SW:
Increases the visibility and use of resources
Exposes data to a larger audience
Increases metadata sharing, accessibility, and quality
Reduce cataloguing time and costs
Interlinking of resources across institutions:
Improved search accuracy
More efficient information searches – guide researchers to a web of related data based on a single search
allow library users to access a web of related data from a single information search [5].
LD could be generated by technical experts or crowd-sourcing – however!!!!
LD from authoritative sources:
Increased degree of credibility
Used with increased frequency
Librarians are experts at using controlled authorities and vocabularies
Consistent identification and linking of similar entities
Many library authorities and controlled vocabularies already available as LD
Librarians are experts in using these resources
Evolve the SW into a rich and trustworthy information network
LD tools are limited for LD users who do not have a technical background
Not developed with libraries in mind – skills, needs, workflow
Lack of usability and utility testing with librarians
Online Computer Library Center (OCLC) Survey, 2015:
benefits
expose data to a larger audience,
enhancing the library's metadata,
improving search accuracy
Similar challenges were experienced by DRIS
- decided to design a bespoke cataloguing interface that would generate bibliographic records in MODS-RDF
The Metadata Object Description Schema (MODS) is an XML schema for a bibliographic element set that can be used to catalogue library materials
It was developed by the MARC Standards Office of the Library of Congress in response to a need for a simplified XML version of MARC21.
less complex than MARC, MODS is a richer alternative to other popular metadata schemas such as Dublin Core
It allows for the display of hierarchical relationships within, to and from resources.
The Digital Library Federation’s (DLF) Aquifer Initiative have developed a set of implementation guidelines which outline the requirements for describing digital cultural-heritage and humanities-based scholarly resources using MODS
displayLabel – This attribute provides additional text associated with the element if needed for display purposes
language
- ensure that the end product was tailored for use in the library domain meeting the specific needs of DRIS.
Requirements Gathering
- To facilitate the timely and efficient creation of bibliographic records by automating input
To produce MODS records that meet DLF-Aquifer requirements by
force data entry for required fields and constraining data entry options for others.
Mock-Up
The fist phase of the design process was to develop a mock-up of the cataloguing tool.
Inspiration for the mock-up was drawn from the user requirements and also from other cataloguing interfaces.
The mock-up was demonstrated to DRIS and changes were made based upon the feedback given.
On the interface homepage users can create new MODS records and also view/edit previously created records.
Clicking on a record brings the user to the Record View which displays all top-level elements for that record in separate tabs
initially constrain data entry options to only those elements and sub-elements which were identified as required fields in the DLF Guidelines.
- ensure that the minimal data requirements for each record were met prior to the addition of supplementary metadata.
Once these fields were complete, the interface was programmed to expand data entry options to include recommended and optional fields.
Additional subelements and attributes were hidden behind buttons.
Data entry fields and dropdown menu options were programmed to dynamically alter based on prior selections made during the cataloguing process.
This ensured that data entry was constrained to the recommendations set by the DLF Guidelines thus reducing the possibility of metadata errors.
Data entry options were further constrained by the requirements set by DRIS, ensuring that published records met their specific needs.
the data of certain fields were automatically populated based on previous selections made by the user.
For example, in the Name element, when the user selects an authority, the authority URI self-populates.
The URI for role term values also self-populates.
Saved data entries are stored in a relational database.
Think-aloud observation
The interface was tested by observing the DRIS metadata cataloguer using the tool to create a bibliographic record for an item in the repository. During the observation the participant was asked to provide verbal feedback on the tool.
The test indicated that, although the metadata cataloguer was satisfied with the direction of the tool, there were some issues with the layout of certain buttons and data entry fields on the interface.
Make more intuitive for new users
Clarify required, recommended versus optional fields
Also, a number of new requirements were identified, which included facilitating the interlinking of published RDF records with other LD datasets on the SW.
And also adding a tab that would allow the user to view an entire record on one page.
In order to publish the records entered into the cataloguing tool in RDF, an R2RML mapping was developed based on the MODS-RDF ontology.
R2RML is a W3C Recommendation for declaring mappings from relational databases to RDF datasets.
Using these mappings, MODS-RDF records were generated for a small sample of DRIS’s materials.
A number of SPARQL (RDF query language) queries, which reflected typical searches conducted by library users, were then successfully run over the RDF dataset.
These queries reflected typical searches conducted by library users, e.g. search by author, title, abstract, and place of publication.
Publishing detailed authoritative bibliographic records in RDF also allowed for more specific information searches, for example searching by LC Subject Headings, LC Genre/Form Terms.
Although MODS-RDF records were successfully generated, not all MODS subelements are part of the MODS-RDF ontology, thus some bibliographic meta- data was not included in the RDF output.
In order to include this data, the Metadata Authority Description Schema (MADS) must also be used.
MADS is a XML schema for an authority element set that can be used to add metadata regarding authoritative entities [11].
MADS can be used as a companion to MODS in order to provide data for authoritative entities used in a MODS record.
MODS and MADS share many common subelements, however the MODS-RDF ontology includes only what is solely part of MODS.
Therefore, in order to generate a full MODS bibliographic record in RDF, both MODS RDF and MADS RDF ontologies must be used.
Using both ontologies ensures the inclusion of subelements that are shared across MODS and MADS.
In the process of adding MADS properties to the R2RML mappings, it was noted that, unlike MODS-RDF, some MADS-RDF properties are grouped in RDF Collections, which are a special RDF construct to represent lists.
This grouping allows for labels, such as title and name, to be reconstructed with all elements in the correct order.
However, at the time, R2RML did not support the mapping of RDF collections, thus initially some metadata could not be uplifted and published as part of the RDF record output.
This issue inspired a separate project which explored the development of a minimal R2RML extension which would allow for the generation of MADS-RDF in a self contained way, that is not relying on additional pre- or post-processing steps.
This prototype was successfully tested on a small sample of records from DRIS, thus a full MODS record could be generated
Future research will focus on making the tool more user-friendly, adding supplementary features to the interface, and testing subsequent iterations with increased numbers of participants.
Research will also explore how to engage librarians in the process of interlinking LD datasets.