Working with Large Archives in AtoM

•Download as PPTX, PDF•

0 likes•27 views

The document discusses the National Library of Wales' implementation and management of a large archive in ArchivesSpace (AtoM). It notes that they have nearly 15,000 records in AtoM and over 800,000 total records. It describes how they cache Dublin Core and EAD metadata from AtoM to improve harvesting performance for their discovery systems. The caching process is CPU and memory intensive, taking months initially and now being done on an individual archive basis. Scripts are used to automate the caching and updating of records. Future plans include embedding a universal viewer in AtoM records and continuing work on their OAI-PMH harvesting.

Software

Working with large
archives in AtoM
VICKY PHILLIPS
DIGITAL STANDARDS MANAGER
NATIONAL LIBRARY OF WALES

Background
 Implemented AtoM in 2015 and upgraded to version 2.4 in 2017
 14,936 top level published records, 811,230 total published records
 Primo (Exlibris) main discovery interface
 Harvesting Dublin Core metadata from AtoM via OAI-PMH
 Example record in AtoM and same record in Primo
 Archives Hub will harvest our EAD metadata from AtoM via OAI-PMH

Caching of DC & EAD XML
 Caching done on clone of live system and copied across to live
 128GB RAM and 8 CPUs – 6 months to cache
 Increased to 26 CPUs
 Single thread – Multi thread
 Generate list of all records for caching
 Split the list into smaller lists and spread them over most of the CPUs
allocated
 2-3 days

Updating cached DC & EAD XML
 Auto-caching - not an option for us
 Small edit on an average size archive - 1 hr to complete
 Caching archives on an individual basis
 Archivists inform us when they’ve published or edited an archive
 Use the slug to generate a list of all the records that form part of archive
 List of records then sent for re-caching
 Updates OAI which in turn will update record in Primo
 Deletions

More about the scripts
 Get OAI identifier from slug
php symfony nlw:get-oai-identifier --slug=daniel-protheroe-and-rhys-morgan-
papers-2
732020
https://archives.library.wales/index.php/;oai?verb=GetRecord&identifier=oai:dalto
n-clone.llgc.org.uk:_732020&metadataPrefix=oai_dc
 Re-cache EAD and DC xml renditions using slug i.e.
php symfony cache:xml-representations --slug=daniel-protheroe-and-rhys-
morgan-papers-2
This is done my making slight modification to the following file
lib/task/arCacheDescriptionXmlTask.class.php

$ Utility script – to generate list of all slugs that form part of an archive list_slugs_for_all_archives.py  Re-caching entire archive process ./list_slugs_for_all_archives.py > /tmp/slugs.txt then cat /tmp/slugs.txt | parallel "php symfony cache:xml-representations --slug={.}“ Or as a single command ./list_slugs_for_all_archives.py | parallel "php symfony cache:xml-representations --slug={.}"$

Loading to Primo
 Total refresh of data
 On-going updates to be managed via Primo pipe via OAI-PMH
 Top level AtoM records dedup with corresponding record in Alma
 If the item has been digitised and ingested to Fedora

AtoM OAI-PMH Development Work
Phase 2
 Phase 1 – 6 institutions (National Library of Wales, University of York,
Strathclyde University, The Mills Archive, University of Gloucestershire and
Glasgow Caledonian University)
 https://digital-archiving.blogspot.com/search/label/OAI-PMH
 Phase 2 –
o Expose new records at any level to the harvester
o Alert the harvester to which records have been deleted

What’s next for NLW
 Embedding Universal Viewer in AtoM
 https://archives.library.wales/index.php/llyfr-hugh-hughes-bardd-coch

Thanks!
Vicky Phillips
vicky.phillips@llgc.org.uk
@vickyfphillips

What's hot

Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech

How to deploy your Rails application on Windows曦徐

Deploying JRuby Web ApplicationsJoe Kutner

Docker Compose and Panamax - ContainerDays Boston - June 2015Jonas Rosland

CoreOS: Control Your FleetMatthew Jones

CoreOS @Codetalks HamburgTimo Derstappen

Scaling an ELK stack at bol.comRenzo Tomà

CoreOS @ summer meetup in UtrechtTimo Derstappen

Docker slidesAyla Khan

Docker n coRohit Jnagal

Deploying with JRubyJoe Kutner

Rubyspec y el largo camino hacia Ruby 1.9David Calavera

Bind How Tocntlinux

Linux comands for HadoopPM Venkatesha Babu

Clug 2011 March web server optimisationgrooverdan

DockerCoreNetEimantas Žlabys

How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà

Logstash family introductionOwen Wu

Kong in 1.x TerritoryThibault Charbonnier

CoreOS introTimo Derstappen

What's hot (20)

Scaling an invoicing SaaS from zero to over 350k customers

How to deploy your Rails application on Windows

Deploying JRuby Web Applications

Docker Compose and Panamax - ContainerDays Boston - June 2015

CoreOS: Control Your Fleet

CoreOS @Codetalks Hamburg

Scaling an ELK stack at bol.com

CoreOS @ summer meetup in Utrecht

Docker slides

Docker n co

Deploying with JRuby

Rubyspec y el largo camino hacia Ruby 1.9

Bind How To

Linux comands for Hadoop

Clug 2011 March web server optimisation

DockerCoreNet

How bol.com makes sense of its logs, using the Elastic technology stack.

Logstash family introduction

Kong in 1.x Territory

CoreOS intro

Similar to Working with Large Archives in AtoM

New Oracle Infrastructure2markleeuw

oracle dbauday jampani

Performance all teh thingsMarcus Deglos

Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeMarco Gralike

Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni

cache concepts and varnish-cacheMarc Cortinas Val

Alfresco DevCon 2018: From Zero to Hero Backing up AlfrescoToni de la Fuente

From zero to hero Backing up alfrescoToni de la Fuente

Apache Airflow @ Blinken OSAJzsefGborBn

Lock, Stock and Backup: Data GuaranteedJervin Real

Oracle Instance Architecture.pptHODCA1

AtoM's Command Line Tasks - An IntroductionArtefactual Systems - AtoM

Cloning 2Deepti Singh

Scaling PHP appsMatteo Moretti

Using ACFS as a Storage for EBSAndrejs Karpovs

11g R2afa reg

Docker Security ParadigmAnis LARGUEM

Lecture2 oracle pptHitesh Kumar Markam

Containerization Is More than the New VirtualizationC4Media

Similar to Working with Large Archives in AtoM (20)

New Oracle Infrastructure2

oracle dba

Performance all teh things

Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike

Containerization is more than the new Virtualization: enabling separation of ...

cache concepts and varnish-cache

Alfresco DevCon 2018: From Zero to Hero Backing up Alfresco

From zero to hero Backing up alfresco

Apache Airflow @ Blinken OSA

Lock, Stock and Backup: Data Guaranteed

Oracle Instance Architecture.ppt

AtoM's Command Line Tasks - An Introduction

Cloning 2

Scaling PHP apps

Using ACFS as a Storage for EBS

11g R2

Docker Security Paradigm

Lecture2 oracle ppt

Containerization Is More than the New Virtualization

Recently uploaded

Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531

Optimizing AI for immediate response in Smart CCTVshikhaohhpro

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI

How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc

The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveCall Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Right Money Management App For Your Financial GoalsJhone kinadey

Diamond Application Development Crafting Solutions with PrecisionSolGuruz

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.

HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823

Microsoft AI Transformation Partner Playbook.pdfWilly Marroquin (WillyDevNET)

A Secure and Reliable Document Management System is Essential.docxComplianceQuest1

5 Signs You Need a Fashion PLM Software.pdfWave PLM

TECUNIQUE: Success Stories: IT Service providermohitmore19

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx

Optimizing AI for immediate response in Smart CCTV

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI

How To Use Server-Side Rendering with Nuxt.js

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf

Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live

Right Money Management App For Your Financial Goals

Diamond Application Development Crafting Solutions with Precision

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...

Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...

HR Software Buyers Guide in 2024 - HRSoftware.com

Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...

CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️

Microsoft AI Transformation Partner Playbook.pdf

A Secure and Reliable Document Management System is Essential.docx

5 Signs You Need a Fashion PLM Software.pdf

TECUNIQUE: Success Stories: IT Service provider

Working with Large Archives in AtoM

1. Working with large archives in AtoM VICKY PHILLIPS DIGITAL STANDARDS MANAGER NATIONAL LIBRARY OF WALES

2. Background  Implemented AtoM in 2015 and upgraded to version 2.4 in 2017  14,936 top level published records, 811,230 total published records  Primo (Exlibris) main discovery interface  Harvesting Dublin Core metadata from AtoM via OAI-PMH  Example record in AtoM and same record in Primo  Archives Hub will harvest our EAD metadata from AtoM via OAI-PMH

3. Caching of DC & EAD XML  Caching done on clone of live system and copied across to live  128GB RAM and 8 CPUs – 6 months to cache  Increased to 26 CPUs  Single thread – Multi thread  Generate list of all records for caching  Split the list into smaller lists and spread them over most of the CPUs allocated  2-3 days

4. Updating cached DC & EAD XML  Auto-caching - not an option for us  Small edit on an average size archive - 1 hr to complete  Caching archives on an individual basis  Archivists inform us when they’ve published or edited an archive  Use the slug to generate a list of all the records that form part of archive  List of records then sent for re-caching  Updates OAI which in turn will update record in Primo  Deletions

5. More about the scripts  Get OAI identifier from slug php symfony nlw:get-oai-identifier --slug=daniel-protheroe-and-rhys-morgan- papers-2 732020 https://archives.library.wales/index.php/;oai?verb=GetRecord&identifier=oai:dalto n-clone.llgc.org.uk:_732020&metadataPrefix=oai_dc  Re-cache EAD and DC xml renditions using slug i.e. php symfony cache:xml-representations --slug=daniel-protheroe-and-rhys- morgan-papers-2 This is done my making slight modification to the following file lib/task/arCacheDescriptionXmlTask.class.php

6.  Utility script – to generate list of all slugs that form part of an archive list_slugs_for_all_archives.py  Re-caching entire archive process ./list_slugs_for_all_archives.py > /tmp/slugs.txt then cat /tmp/slugs.txt | parallel "php symfony cache:xml-representations --slug={.}“ Or as a single command ./list_slugs_for_all_archives.py | parallel "php symfony cache:xml-representations --slug={.}"

7. Loading to Primo  Total refresh of data  On-going updates to be managed via Primo pipe via OAI-PMH  Top level AtoM records dedup with corresponding record in Alma  If the item has been digitised and ingested to Fedora

8. AtoM OAI-PMH Development Work Phase 2  Phase 1 – 6 institutions (National Library of Wales, University of York, Strathclyde University, The Mills Archive, University of Gloucestershire and Glasgow Caledonian University)  https://digital-archiving.blogspot.com/search/label/OAI-PMH  Phase 2 – o Expose new records at any level to the harvester o Alert the harvester to which records have been deleted

9. What’s next for NLW  Embedding Universal Viewer in AtoM  https://archives.library.wales/index.php/llyfr-hugh-hughes-bardd-coch

10. Thanks! Vicky Phillips vicky.phillips@llgc.org.uk @vickyfphillips

Working with Large Archives in AtoM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Working with Large Archives in AtoM

Similar to Working with Large Archives in AtoM (20)

Recently uploaded

Recently uploaded (20)

Working with Large Archives in AtoM