The document discusses the National Library of Wales' implementation and management of a large archive in ArchivesSpace (AtoM). It notes that they have nearly 15,000 records in AtoM and over 800,000 total records. It describes how they cache Dublin Core and EAD metadata from AtoM to improve harvesting performance for their discovery systems. The caching process is CPU and memory intensive, taking months initially and now being done on an individual archive basis. Scripts are used to automate the caching and updating of records. Future plans include embedding a universal viewer in AtoM records and continuing work on their OAI-PMH harvesting.
2. Background
Implemented AtoM in 2015 and upgraded to version 2.4 in 2017
14,936 top level published records, 811,230 total published records
Primo (Exlibris) main discovery interface
Harvesting Dublin Core metadata from AtoM via OAI-PMH
Example record in AtoM and same record in Primo
Archives Hub will harvest our EAD metadata from AtoM via OAI-PMH
3. Caching of DC & EAD XML
Caching done on clone of live system and copied across to live
128GB RAM and 8 CPUs – 6 months to cache
Increased to 26 CPUs
Single thread – Multi thread
Generate list of all records for caching
Split the list into smaller lists and spread them over most of the CPUs
allocated
2-3 days
4. Updating cached DC & EAD XML
Auto-caching - not an option for us
Small edit on an average size archive - 1 hr to complete
Caching archives on an individual basis
Archivists inform us when they’ve published or edited an archive
Use the slug to generate a list of all the records that form part of archive
List of records then sent for re-caching
Updates OAI which in turn will update record in Primo
Deletions
5. More about the scripts
Get OAI identifier from slug
php symfony nlw:get-oai-identifier --slug=daniel-protheroe-and-rhys-morgan-
papers-2
732020
https://archives.library.wales/index.php/;oai?verb=GetRecord&identifier=oai:dalto
n-clone.llgc.org.uk:_732020&metadataPrefix=oai_dc
Re-cache EAD and DC xml renditions using slug i.e.
php symfony cache:xml-representations --slug=daniel-protheroe-and-rhys-
morgan-papers-2
This is done my making slight modification to the following file
lib/task/arCacheDescriptionXmlTask.class.php
6. Utility script – to generate list of all slugs that form part of an archive
list_slugs_for_all_archives.py
Re-caching entire archive process
./list_slugs_for_all_archives.py > /tmp/slugs.txt
then
cat /tmp/slugs.txt | parallel "php symfony cache:xml-representations --slug={.}“
Or as a single command
./list_slugs_for_all_archives.py | parallel "php symfony cache:xml-representations --slug={.}"
7. Loading to Primo
Total refresh of data
On-going updates to be managed via Primo pipe via OAI-PMH
Top level AtoM records dedup with corresponding record in Alma
If the item has been digitised and ingested to Fedora
8. AtoM OAI-PMH Development Work
Phase 2
Phase 1 – 6 institutions (National Library of Wales, University of York,
Strathclyde University, The Mills Archive, University of Gloucestershire and
Glasgow Caledonian University)
https://digital-archiving.blogspot.com/search/label/OAI-PMH
Phase 2 –
o Expose new records at any level to the harvester
o Alert the harvester to which records have been deleted
9. What’s next for NLW
Embedding Universal Viewer in AtoM
https://archives.library.wales/index.php/llyfr-hugh-hughes-bardd-coch