Transcribing Nano Nagle’s letters using
collaborative transcription services
UCD Digital Library
The Techie Bits
Honora (Nano) Nagle (1718-1784)
Foundress of the Sisters of the Presentation of the
Blessed Virgin Mary (PBVM)
17 original and copy manuscript letters composed by
Nano Nagle over a period of fourteen years.
Nano writing to Miss Mulally (x9)
Nano writing to Fitzsimons (x8)
Letters archived in Dublin, Cork, San Francisco and New York
Part of a much wider Convent Collections
research project, lead by Professor Deirdre Raftery,
UCD School of Education
Presentation Sisters Congregational Archive,
Nano Nagle Place, Douglas Street, Cork, Ireland
New Windsor, New York, USA
George's Hill, Dublin 7, Ireland
I had the pleasure of receiving
y’ kind favor, and hope my last letter, has convinc’d
you that it was now neglect on my part, not answer
ing you soon’r, as nothing can give me more real plea…
The Techie Bits
2. Wiki-style editing
3. Version Control
6. Automatic markup
9. OCR Correction
The Techie Bits
Ingest the collection
Transcribing Nano Not yet connected…
Transcribing Nano Upload to FromThePage
Transcribing Nano Mirador Image Viewer and
FromThePage - connected!
The Techie Bits
Available on each letter’s
Mirador Image Viewer
Full text indexing of handwritten material, previously not accessible. Editorial guidelines needed, which can be ignored by users
TEI export – can be reused for further research. Users need to be comfortable with some mark-up
Large collections can be transcribed, even with limited resources Large collections can take a long time to transcribe using volunteers
Comfortable wiki-style transcribing environment Limited GUI controls
Building communities of new experts and deep engagement Requires a large amount of moderation
Projects often draw in experts in a variety of fields Not yet really suitable for structured data, like ledgers
Interoperability with IIIF (and more!)
Public and private projects
• Crowdsourced transcription platforms are amazing
• People really enjoy transcription work (including operational staff) but only do the
content they’re interested in, and at their own pace!
• We learned so much (editorial work takes aaaages!)
• We need to learn more (TEI refresher)
• We had to do a bit of infrastructure building (but IIIF is phenomenal)
• We know that more technical development is needed
• And Nano’s letters can now be used by researchers, enabling digital scholarship
Launching on June 8th, 2018 Digital Library managers
Dr John B. Howard, Julia Barrett
Digital collections team
Dani Montes, Peter Clarke
Audrey Drohan, Órna Roche
Thank you for staying until the very last talk of the conference, bar the closing remarks.
Today I’m talking about collaborative transcription services…
…which UCD Digital Library used to help expose the wonderful content of one of our newest collections.
There’s nothing too technology heavy in this talk, as I know you must be exhausted at this stage!
So…Nano Nagle is the Foundress of the Presentation Sisters, which were set up in 1775.
The tercentenary of her birth will be marked in June 2018, and will include the online publication of 17 of her letters. This collection was brought to us by Principal Investigator Prof. Deirdre Raftery, from the UCD School of Education, who had previously brought us in another nun collection called Loreto 1916. #nuntastic seems to be an official tag on twitter for describing these collections, hence my talk’s title.
The physical letters are curated in Dublin, Cork, San Francisco and New York (clocking up an impressive 17,482 km between them), so by digitising the collection and facilitating the virtual reunification of such geographically dispersed material, we can save researchers quite a bit of air miles and money!
Creating a digital collection from transatlantic archives was challenging enough – we could only scan the ones held in Dublin! As you can see, even the scans look different from each other.
But this is a collection of handwritten letters, so we needed to figure out how to unlock the content of what is consider ‘OCR-resistant’ texts.
So, we started how we normally start…
The collection was profiled, digitised, given rights statements.
It was also fully catalogued using authorised names and subject headings, and then ingested into our digital repository, Fedora.
But at this stage the text inside the letter...the content…was not fully searchable.
And as you can see…it’s fairly legible…ish!
But we could do with a bit of help!
Deirdre and her team actually did the transcriptions for the letters for us using MS Word…so we technically crowdsourced from a population of three!
We’ve dealt with transcription before but not in any kind of elegant way, and not in a way where we could get help from outside the team.
So we needed a solution to enable crowdsourcing of transcriptions, and that allowed us to add that content, and previously transcribed content, to the digital library.
Now…the UCD Digital Library is already quite a complex infrastructure.
We have repository software Fedora in the background, and we’ve implemented the IIIF framework, which Cillian has already described.
IIIF allows us to do amazing things with images within the Mirador Image Viewer.
And the framework also allows us to add value to the content. It starts with a canvas, to which you add the image, and through a manifest (or list of associated resources) you can connect in other things…including transcriptions.
Now…we evaluated a few different transcription technology platforms, like Scribe and Transkribus, and we chose to go with FromThePage, as it had some features that really appealed to us, like the export of the transcriptions as TEI.
What we needed, and what FromThePage provides, is the ability to push content out onto the platform, allow users to go from our Image Viewer to the content to help transcribe it, and we can then pull the transcriptions back into our system, which are then preserved as part of our preservation activities.
Subsequent users can search the full text in the Digital Library, download the TEI to reuse for further research, view the transcribed text on the FtP platform, and eventually we hope to be able to offer the ability to view the transcribed text with the Image Viewer itself.
And how do we do that? Well, thanks to interoperability between the two systems, we can use the IIIF API to push the content into FtP, and use the FtP Contribution API to pull the transcriptions back into the Digital Library.
So, we start the whole workflow by ingesting the digital collection into Fedora and publishing online.
At this stage when you go in, there is no connection with FtP
To add a collection to FtP, you go to the platform’s dashboard and upload the collection using their import tools.
You can load the files directly by uploading PDF or ZIP files, but in the case of Nano, we were able to use the IIIF manifest for the Nano collection in the Digital Library to pull the collection into FtP.
Once Nano’s letters were uploaded onto the FtP platform, a small red pencil appeared in the Mirador Image Viewer to let us know the letters were available to be transcribed. We could also log in directly to the FtP platform.
Clicking on the red pencil brings you directly into the FtP platform…
…and into edit view, where you can transcribe.
As I said, we already had the transcriptions done so for Nano we copied and pasted them into the editor.
We then marked up some of the content as subjects. Here we only focused on People and Places, but you can customise what can be marked up.
Once that was done, each letter got marked for review. The DL team also acted as editors, so once the collection was reviewed, the Needs Review box got unchecked abd we moved on to editing the subjects…
As you can see, the marked up text for the Subjects becomes hyperlinked and these can be mapped to other instances of the same name.
You can go into the subject categories to review them, do further corrections,
augment with additional information,
and link to other instances within the Nano letter’s collection.
You can also view the relationships between the subjects and the letters they appear in.
Then the marked up transcriptions for the collection of letters were ready to go back into the Digital Library.
You can export out the content using the FtP dashboard – as I already said, we have it set up so that by using FtP’s Contributions API, our fedora repository can pull the content back into the Digital Library, once it satisfies the criteria for being complete.
Solr can then index the transcriptions for full text searching, and the TEI files become available to download through the Letter’s descriptive record…
Back in the Mirador Image Viewer, it now looks like this…complete with blue icon to denote that there is a transcription available.
Currently to see the transcription and the letter side by side you have to go back into FtP. Hopefully, with future developments, you’ll be able to see the transcription on the same canvas within the Mirador Image Viewer.
So…we may not have used external users with our collaborative transcription service but we did external experts.
As this is still a pilot service, we were interested to see how we could push a collection through its workflow. And I have to say, there are more Pros than Cons.
The big thing for us was being able to get the transcribed content back into our preservation system, and being able to enable full text searching on it. All of this possible thanks to IIIF and the interoperability between the UCD Digital Library and FromThePage.
So, in conclusion:
Crowdsourced transcription platforms are great. We’ve added other collections to FromThePage and even without promoting that fact, people are transcribing our content.
- We’ve learned loads ourselves through this process
We’ve had to do quite a bit of technical development, but the Brumfields who created FromThePage are a great team and are very open to making changes (so long as we pay them, bizarrely)
And Nano’s collection of letters have been greatly enhanced by the process, and can now offer scholars the opportunity to engage with the content in new ways
The Collected Letters of Nano Nagle, complete with transcribed text, will be available from June 8th this year.
Thank you for your attention.