The Collections UofT Repository and Enterprise Content Management - use cases from archivists' perspectives for the Islandora digital collections platform.
The Collections UofT Repository and Enterprise Content Management
1. The Collections UofT Repository
and Enterprise Content Management
2014.05.06 - 11:20 AM - 12:00 PM
Toronto/Ryerson/York (TRY) Conference
Carr Hall, Room 406
Sara Allain, Special Collections Librarian, UTSC Library
Kelli Babcock, Digital Initiatives Librarian, UTL ITS
Danielle Robichaud, Archives Assistant, John M. Kelly Library, USMC
Karen Suurtamm, Archivist, UTARMS
Ken Yang, Digital Humanities Application Programmer, UTL ITS
2. Presentation Overview
1. Introducing the Collections UofT platform (Kelli)
2. Use case: UTARMS (Karen)
3. Use case: Nouwen family photograph albums (Danielle)
4. Use case: using OAI-PMH to share metadata (Sara)
4. What is Collections UofT?
Enterprise Content Management Approach
● Problem: How best to manage our digital
projects/digital assets while leveraging limited
resources?
● One possibility: a repository inspired by the
“enterprise content management” framework
● An enterprise content management approach
offers guidelines for the architecture and
management behind a repository, rather than
simply a technical solution
5. Enterprise Content Management
Enterprise Content Management comprises the strategies, processes, methods,
systems, and technologies that are necessary for capturing, creating, managing, using,
publishing, storing, preserving, and disposing content
within and between organizations.
ECMs and Institutional Repositories: The Case for a Unified Enterprise Approach to Content Management
Malcolm Wolski, Natasha Simons and Joanna Richardson, 2013
http://eprints.utas.edu.au/16317/1/THETA_2013_Wolski.pdf
12. Islandora and the U of T Archives
Karen Suurtamm, Archivist
U of T Archives & Records Management Services (UTARMS)
13. Our digitized content
● Lansdale fonds
o Total: 50,000
o Digitized: 27,000
o Online: 3,500
● Photos
o Total: 250,000+
o Online: 2000
● Published material
o Total: 2400+ titles
o Online: 14 titles;
57,000 pages
● Textual records
o Total: 11,000+ m
o Online: barely anything
14. Heritage site
● Initiated by President’s office
● Created by ITS; launched 2012
● Scope: U of T history
● Material from repositories across U of T
o UTARMS
o Fisher
o UTSC
o UTM
15.
16.
17.
18. Benefits of Heritage Site
● Increased exposure
● Search and browse across repositories
● Easy browsing/searching for internal use
● Faceted browsing
● Multimedia: photographs, documents, maps, drawings,
films, books
● Increased collaboration
22. Limitations of Heritage
● Strict mandate/scope
● Another ‘place’ to look for things
● Not linked back to our website and archival descriptions
● Little control over how our material displays
23.
24. Collections:
U of T Archives multi-site
● Central space : decentralized workflow
● More control over collection display
● Own look/feel for our page
● Can build our own menu items: About, Contact, How to
cite, etc.
● Content is unlimited in scope
25. Islandora for Archives: Benefits
● Supports multiple formats
● Open source
o Can build/add/adapt
o Room to grow/change
o community/collaboration
● Pairing with other software
o Exhibits (Timeline etc)
o Preservation (Archivematica)
o Description (AtoM)
26. Islandora for Archives: Challenges
● Communicating that ‘this isn’t everything’
● Use of the term ‘collections’
● Metadata: balancing cohesion and autonomy with other
professions/sectors
● Preserving/communicating context
● Preserving/communicating hierarchical arrangement
32. Meet the Nouwens
Maria and Laurent J.M. Nouwen’s 25th wedding anniversary (April 1956)
P6133
Page 28 of
Album 1
33. Project Overview
● 18 photograph albums donated in 2012
● 650 pages with more than 4000 photographs
● Funding was provided for the digitization and
description of the albums
● Online access key component of agreement
● Albums compiled by the Nouwen family and
Henri Nouwen
38. Why Collections U of T?
● Higher profile for USMC Collections
● Allows for centralized online dissemination
with built-in IT support
● Access to academic audience
● Collections U of T collaborators as built-in
community of practice
39. Challenges
● Photograph albums don’t align with the Book
or Large Image Content models
● Resources, time and knowledge required for
site customization and configuration
● Archival descriptive standards vs Dublin
Core basic elements
42. How Do You Solve a Problem Like UTSC?
● The DSU at UTSC has its own Islandora
o We use it in different ways than ITS
o Not an institutional repository but an organic digital
scholarship tool that we need to be able to
experiment with
● But we think it’s important to contribute!
43. Learn to Share!
● Create a search interface that will find our
stuff via Collections UofT without the effort of
ingesting it twice
● OAI-PMH: Open Archives Initiative Protocol
for Metadata Harvesting
44. Metadata Harvesting via OAI-PMH
UTSC
metadata
available via
Islandora OAI
module
Collections
UofT
harvester
Request made via HTTP
Dublin Core XML gathered
via OAI-PMH
Metadata made available
via Collections UofT site
45. Pros and Cons of OAI-PMH
Advantages:
● Relatively easy
● Minimal duplication of
ingest effort
● No loss if UTSC decides
to change how we use
Collections UofT
Issues:
● User interface changes
● Only harvests simple
Dublin Core
● Out of date - ATOM,
ResourceSync, or LOD
could do this better
going to talk about how the U of T Archives has used Islandora, and some thoughts about using collections moving forward
To give some perspective, only a very small fraction of our total content has been digitized for online delivery.
Our most significant digitization project was Lansdale fonds - give numbers
Besides the Lansdale material, we have more than 200,000 photographs and only 1918 are digitized and available online
Recently digitized 57,000 pages from 14 titles via Internet archives, but we have more than 2400 titles
As for textual records, things are much less available - of our more than 11,000 metres of textual records, barely anything is available online.
Our first use of Islandora was via the heritage site
Main Heritage site page
able to browse and explore
Browse page
page for single item
one of our staff members researched TimelineJS, which allowed us to put timeline exhibits on the site
this is our exhibit for the 40th anniversary of Robarts - able to highlight images in Heritage and give them some context
Have also posted our UofT chronology on the site
In the future, may convert this to timeline as well
Site has been useful as outreach tool
posting directly on social media - links people to the portal, rather than downloading and posting a static image
that way people can discover more of our content
Heritage did have some limitations
This is our multi-page on the Collections site
We’ve done little with it so far :)
Moving our efforts from Heritage to Collections has some promise for us
Islandora is used by different disciplines/spheres: libraries, archives, scholars, researchers
Wasn’t built specifically for archives
Means there are particular opportunities and challenges in terms of how archivists might make use of it
Some challenges for the U of T Archives, that we’ll need to think about moving forward
For example, this is the first page of the Robert Lansdale fonds - it is not clear that we’re actually looking at series here (series appear just the same as individual items)
Here we’re looking at individual items within the series
This is a page for a specific item in the series - but it’s place within the hierarchy is not especially clear
This is even more confusing if we’re looking at a single item that is part of a larger fonds containing various types of records (not just photographs) and is arranged into series etc.
-The Nouwen photograph albums are housed at the Nouwen Archives and Research Collection in the John M. Kelly Library
-They’re inclusion on Collections U of T is the result of a project to digitize the albums
-I’m going to talking about project, discuss why the albums are an important part of the Nouwen Archives and explore some of the benefits and challenges associated with using Collections U of T as a platform for sharing them
-I’ll start by introducing you to the Nouwens
-In North America Henri Nouwen is the most well-known of the Nouwen family - he was a Dutch Catholic priest.
-Interest in the psychological underpinnings of one’s relationship with God
-Perhaps best known for his writings on spirituality
-Spent the last years of his life at L’Arche Daybreak in Richmond Hill which is one of the reason the Library acquired his personal papers
-He was the eldest son of Maria Nouwen-Ramselaar and Laurent J.M. Nouwen, both of whom came from large families, and had 4 children of their own.
-Among Henri’s immediate and extended families were prominent lawyers, members of the church and influential business officials.
-His uncle Toon Ramselaar was a Roman Catholic priest
-His brother Paul Nouwen played an influential role in Dutch transportation policy
-Now that everyone is acquainted I’ll walk you through the project..
[Slide]
-The majority of the albums were put together and cared for by Henri’s mother Maria - her annotations are visible through out
-A handful of the albums were put together by Henri himself and acquired by the Nouwen family after his death
-The albums are important because they provide insight about Nouwen’s childhood and the relationships he had with his family
-Although Henri kept a great deal of his personal and professional materials, the bulk of what is held at the Archives pertains to his adult life including coursework, teaching materials, journals, drafts, photographs, etc.
-He regularly referenced his family relationships in personal and published works, so the albums provide context to the themes he dealt with later in life.
-Frivolous note: THEY’RE COOL!
-Age of albums range from 18 to 82 years
-Older albums date back to the early 1930s
-Majority of albums fall within the 1940s and 60s
-Some albums only have 2 dozen pages, others have over 90
-Maria created albums for each child
-Nijkerk 1932
Scheveningen 1948
1949: Trip to the seminary in Apeldoorn where Henri would later study and his uncle was president
-scrap book in nature, put together by Henri as a teen
Photos from Yale, etc. early 1970s
-As part of a much larger campus, federated colleges can get lost in the shuffle
-USMC does not have dedicated IT resources or infrastructure necessary for digital projects
-In addition to providing access to Islandora, they can store and backup our files
-Nouwen material very popular with off-campus scholars and researchers, Collections U of T is a good way to showcase material to the U of T community and tie it in with of-interest teaching subject areas
-Working with a centralized service means we’re not at it alone - we have access to IT expertise, as well as other users working on similar projects and facing similar problems
-Photograph albums aren’t well suited for Islandora content models
-If it’s uploaded as a book the page descriptions are lost
-If it’s uploaded as a large image the descriptions are present but the albums can’t be viewed as a whole
-Absence of next page feature
-Getting content online is a lengthy and involved process that requires a good deal of planning - eg: how will collections be structured, etc.
-Once content is online, customization and configuration isn’t intuitive - It requires time (often in short supply) and a certain ease with technical backend work
-The centralized service addresses absence of IT support, but staffing time and resources still required to get content online and looking good
-A contract may get content online, but it doesn’t address long-term maintenance
-A work study student isn’t going to get content online
-Archival descriptive standards, such as the Rules for Archival Description don’t align well with Dublin Core
-Context is very important in archival descriptions and DC doesn’t accommodate information like provenance, notes or hierarchical relationships
-We’ve also had to drop information such as photo identification because it’s not possible to format the description field during ingestion -> spaces needed for logical narrative, etc are collapsed into a wall of text
-Where the norm is generally stand alone photos, books or textual records, we’re describing entire pages
-Working with ITS means having access to CONNECT, an internal wiki where info, challenges and solutions can be shared
-We have a built-in support system for sharing challenges and problem-solving solutions
-Created page called Crosswalking Archival and Special Collections descriptions with Dublin Core basic elements
-Includes examples of how other archival collections have been described online
-Maps out confusion regarding the DC guidelines
-Proposes solutions for how to adapt
-Crosswalking as an attempt to standardize (with wiggle room) the way archival description are being mashed into DC
-Example of how the crosswalking work translates
FIN
Internal IT support
We’re against the siloization of UTSC!
What we want is a search interface that will allow users to find content from UTSC’s digital collections via the Collections UofT site
But we don’t want to ingest our content twice - once on our server and once on ITS’s - because it’s time consuming to initiate and maintain
So we turn to OAI-PMH! The Open Archives Initiative Protocol for Metadata Harvesting
The OAI was established to create an interoperability framework for institutes that wanted to share digital collections - they came up with the PMH as technical description for how to make metadata openly available for harvesting
The Collections UofT harvester makes a request via HTTP (the hypertext transfer protocol - it’s how stuff is gathered from servers) to the UTSC Islandora server, which has had the Islandora OAI module enabled
The harvester gathers metadata - Dublin Core XML - from the UTSC server and takes it back to the Collections server, where it gets indexed in Solr and GSearch so that it’s discoverable
Now if someone searches Collections for something like Doris McCarthy or Watts Lecture - UTSC specific terms - they will get search results from our digital material as well as the material made available by UTARMS and our other partners
And if they click through, they’ll be taken to the full record on the UTSC Islandora site
And this request cycle can be scheduled to occur on a regular and frequent basis so that Collections UofT users always have access to the most up-to-date metadata
Check in here to make sure audience understands this concept
So, as with any technology there are pros and cons to working with OAI-PMH
On the pro side of things:
OAI-PMH is relatively easy to implement - it’s been around for a long time so the support has been built up, so we have things like the Islandora OAI module to expose the metadata
We don’t have to duplicate our ingestion effort - we’re not actually ingesting any content; once we’ve configured the OAI-PMH system to do what we want it to do, we can just let it run on a regular schedule with minimal interference from us
And if, for whatever reason we decide to move to Collections altogether, there’s little lost - we didn’t invest a ton of time and effort to do the work, so we have little resource debt if we decide to scrap it
However OAI-PMH certainly has issues:
When you click through to a search result, you end up the UTSC Islandora site - not Collections. There’s a potential for this to be confusing to users
It is only meant to harvest Simple Dublin core - Islandora does produce this (i.e. will automatically crosswalk MODS to create a DC stream)
Finally, there are better technologies out there that do this much more elegantly - OAI-PMH is over ten years old, which is ancient in tech
Atom is a standard for harvesting web feeds, kind of like an RSS feed - Islandora has it but we haven’t experimented with it yet
ResourceSync is a NISO standard that focuses on quick and accurate server-to-service synchronization
Or Linked Open Data, whereby we openly publish data via RDF (Resource Description Framework specifications) and allow it to be gathered by remote services