SlideShare a Scribd company logo
1 of 10
Working with large
archives in AtoM
VICKY PHILLIPS
DIGITAL STANDARDS MANAGER
NATIONAL LIBRARY OF WALES
Background
 Implemented AtoM in 2015 and upgraded to version 2.4 in 2017
 14,936 top level published records, 811,230 total published records
 Primo (Exlibris) main discovery interface
 Harvesting Dublin Core metadata from AtoM via OAI-PMH
 Example record in AtoM and same record in Primo
 Archives Hub will harvest our EAD metadata from AtoM via OAI-PMH
Caching of DC & EAD XML
 Caching done on clone of live system and copied across to live
 128GB RAM and 8 CPUs – 6 months to cache
 Increased to 26 CPUs
 Single thread – Multi thread
 Generate list of all records for caching
 Split the list into smaller lists and spread them over most of the CPUs
allocated
 2-3 days
Updating cached DC & EAD XML
 Auto-caching - not an option for us
 Small edit on an average size archive - 1 hr to complete
 Caching archives on an individual basis
 Archivists inform us when they’ve published or edited an archive
 Use the slug to generate a list of all the records that form part of archive
 List of records then sent for re-caching
 Updates OAI which in turn will update record in Primo
 Deletions
More about the scripts
 Get OAI identifier from slug
php symfony nlw:get-oai-identifier --slug=daniel-protheroe-and-rhys-morgan-
papers-2
732020
https://archives.library.wales/index.php/;oai?verb=GetRecord&identifier=oai:dalto
n-clone.llgc.org.uk:_732020&metadataPrefix=oai_dc
 Re-cache EAD and DC xml renditions using slug i.e.
php symfony cache:xml-representations --slug=daniel-protheroe-and-rhys-
morgan-papers-2
This is done my making slight modification to the following file
lib/task/arCacheDescriptionXmlTask.class.php
 Utility script – to generate list of all slugs that form part of an archive
list_slugs_for_all_archives.py
 Re-caching entire archive process
./list_slugs_for_all_archives.py > /tmp/slugs.txt
then
cat /tmp/slugs.txt | parallel "php symfony cache:xml-representations --slug={.}“
Or as a single command
./list_slugs_for_all_archives.py | parallel "php symfony cache:xml-representations --slug={.}"
Loading to Primo
 Total refresh of data
 On-going updates to be managed via Primo pipe via OAI-PMH
 Top level AtoM records dedup with corresponding record in Alma
 If the item has been digitised and ingested to Fedora
AtoM OAI-PMH Development Work
Phase 2
 Phase 1 – 6 institutions (National Library of Wales, University of York,
Strathclyde University, The Mills Archive, University of Gloucestershire and
Glasgow Caledonian University)
 https://digital-archiving.blogspot.com/search/label/OAI-PMH
 Phase 2 –
o Expose new records at any level to the harvester
o Alert the harvester to which records have been deleted
What’s next for NLW
 Embedding Universal Viewer in AtoM
 https://archives.library.wales/index.php/llyfr-hugh-hughes-bardd-coch
Thanks!
Vicky Phillips
vicky.phillips@llgc.org.uk
@vickyfphillips

More Related Content

What's hot

Scaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersScaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersSpeck&Tech
 
How to deploy your Rails application on Windows
How to deploy your Rails application on WindowsHow to deploy your Rails application on Windows
How to deploy your Rails application on Windows曦 徐
 
Deploying JRuby Web Applications
Deploying JRuby Web ApplicationsDeploying JRuby Web Applications
Deploying JRuby Web ApplicationsJoe Kutner
 
Docker Compose and Panamax - ContainerDays Boston - June 2015
Docker Compose and Panamax - ContainerDays Boston - June 2015Docker Compose and Panamax - ContainerDays Boston - June 2015
Docker Compose and Panamax - ContainerDays Boston - June 2015Jonas Rosland
 
CoreOS: Control Your Fleet
CoreOS: Control Your FleetCoreOS: Control Your Fleet
CoreOS: Control Your FleetMatthew Jones
 
CoreOS @Codetalks Hamburg
CoreOS @Codetalks HamburgCoreOS @Codetalks Hamburg
CoreOS @Codetalks HamburgTimo Derstappen
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
CoreOS @ summer meetup in Utrecht
CoreOS @ summer meetup in UtrechtCoreOS @ summer meetup in Utrecht
CoreOS @ summer meetup in UtrechtTimo Derstappen
 
Docker slides
Docker slidesDocker slides
Docker slidesAyla Khan
 
Deploying with JRuby
Deploying with JRubyDeploying with JRuby
Deploying with JRubyJoe Kutner
 
Rubyspec y el largo camino hacia Ruby 1.9
Rubyspec y el largo camino hacia Ruby 1.9Rubyspec y el largo camino hacia Ruby 1.9
Rubyspec y el largo camino hacia Ruby 1.9David Calavera
 
Bind How To
Bind How ToBind How To
Bind How Tocntlinux
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisationgrooverdan
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.Renzo Tomà
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introductionOwen Wu
 

What's hot (20)

Scaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customersScaling an invoicing SaaS from zero to over 350k customers
Scaling an invoicing SaaS from zero to over 350k customers
 
How to deploy your Rails application on Windows
How to deploy your Rails application on WindowsHow to deploy your Rails application on Windows
How to deploy your Rails application on Windows
 
Deploying JRuby Web Applications
Deploying JRuby Web ApplicationsDeploying JRuby Web Applications
Deploying JRuby Web Applications
 
Docker Compose and Panamax - ContainerDays Boston - June 2015
Docker Compose and Panamax - ContainerDays Boston - June 2015Docker Compose and Panamax - ContainerDays Boston - June 2015
Docker Compose and Panamax - ContainerDays Boston - June 2015
 
CoreOS: Control Your Fleet
CoreOS: Control Your FleetCoreOS: Control Your Fleet
CoreOS: Control Your Fleet
 
CoreOS @Codetalks Hamburg
CoreOS @Codetalks HamburgCoreOS @Codetalks Hamburg
CoreOS @Codetalks Hamburg
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
CoreOS @ summer meetup in Utrecht
CoreOS @ summer meetup in UtrechtCoreOS @ summer meetup in Utrecht
CoreOS @ summer meetup in Utrecht
 
Docker slides
Docker slidesDocker slides
Docker slides
 
Docker n co
Docker n coDocker n co
Docker n co
 
Deploying with JRuby
Deploying with JRubyDeploying with JRuby
Deploying with JRuby
 
Rubyspec y el largo camino hacia Ruby 1.9
Rubyspec y el largo camino hacia Ruby 1.9Rubyspec y el largo camino hacia Ruby 1.9
Rubyspec y el largo camino hacia Ruby 1.9
 
Bind How To
Bind How ToBind How To
Bind How To
 
Linux comands for Hadoop
Linux comands for HadoopLinux comands for Hadoop
Linux comands for Hadoop
 
Clug 2011 March web server optimisation
Clug 2011 March  web server optimisationClug 2011 March  web server optimisation
Clug 2011 March web server optimisation
 
DockerCoreNet
DockerCoreNetDockerCoreNet
DockerCoreNet
 
How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.How bol.com makes sense of its logs, using the Elastic technology stack.
How bol.com makes sense of its logs, using the Elastic technology stack.
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Kong in 1.x Territory
Kong in 1.x TerritoryKong in 1.x Territory
Kong in 1.x Territory
 
CoreOS intro
CoreOS introCoreOS intro
CoreOS intro
 

Similar to Working with Large Archives in AtoM

New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2markleeuw
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh thingsMarcus Deglos
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeMarco Gralike
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Jérôme Petazzoni
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cacheMarc Cortinas Val
 
Alfresco DevCon 2018: From Zero to Hero Backing up Alfresco
Alfresco DevCon 2018: From Zero to Hero Backing up AlfrescoAlfresco DevCon 2018: From Zero to Hero Backing up Alfresco
Alfresco DevCon 2018: From Zero to Hero Backing up AlfrescoToni de la Fuente
 
From zero to hero Backing up alfresco
From zero to hero Backing up alfrescoFrom zero to hero Backing up alfresco
From zero to hero Backing up alfrescoToni de la Fuente
 
Apache Airflow @ Blinken OSA
Apache Airflow @ Blinken OSAApache Airflow @ Blinken OSA
Apache Airflow @ Blinken OSAJzsefGborBn
 
Lock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedLock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedJervin Real
 
Oracle Instance Architecture.ppt
Oracle Instance Architecture.pptOracle Instance Architecture.ppt
Oracle Instance Architecture.pptHODCA1
 
Using ACFS as a Storage for EBS
Using ACFS as a Storage for EBSUsing ACFS as a Storage for EBS
Using ACFS as a Storage for EBSAndrejs Karpovs
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security ParadigmAnis LARGUEM
 
Containerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationContainerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationC4Media
 

Similar to Working with Large Archives in AtoM (20)

New Oracle Infrastructure2
New Oracle Infrastructure2New Oracle Infrastructure2
New Oracle Infrastructure2
 
oracle dba
oracle dbaoracle dba
oracle dba
 
Performance all teh things
Performance all teh thingsPerformance all teh things
Performance all teh things
 
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco GralikeBoost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
Boost Your Environment With XMLDB - UKOUG 2008 - Marco Gralike
 
Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...Containerization is more than the new Virtualization: enabling separation of ...
Containerization is more than the new Virtualization: enabling separation of ...
 
cache concepts and varnish-cache
cache concepts and varnish-cachecache concepts and varnish-cache
cache concepts and varnish-cache
 
Alfresco DevCon 2018: From Zero to Hero Backing up Alfresco
Alfresco DevCon 2018: From Zero to Hero Backing up AlfrescoAlfresco DevCon 2018: From Zero to Hero Backing up Alfresco
Alfresco DevCon 2018: From Zero to Hero Backing up Alfresco
 
From zero to hero Backing up alfresco
From zero to hero Backing up alfrescoFrom zero to hero Backing up alfresco
From zero to hero Backing up alfresco
 
Apache Airflow @ Blinken OSA
Apache Airflow @ Blinken OSAApache Airflow @ Blinken OSA
Apache Airflow @ Blinken OSA
 
Lock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data GuaranteedLock, Stock and Backup: Data Guaranteed
Lock, Stock and Backup: Data Guaranteed
 
Oracle Instance Architecture.ppt
Oracle Instance Architecture.pptOracle Instance Architecture.ppt
Oracle Instance Architecture.ppt
 
AtoM's Command Line Tasks - An Introduction
AtoM's Command Line Tasks - An IntroductionAtoM's Command Line Tasks - An Introduction
AtoM's Command Line Tasks - An Introduction
 
Cloning 2
Cloning 2Cloning 2
Cloning 2
 
Cloning 2
Cloning 2Cloning 2
Cloning 2
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
Using ACFS as a Storage for EBS
Using ACFS as a Storage for EBSUsing ACFS as a Storage for EBS
Using ACFS as a Storage for EBS
 
11g R2
11g R211g R2
11g R2
 
Docker Security Paradigm
Docker Security ParadigmDocker Security Paradigm
Docker Security Paradigm
 
Lecture2 oracle ppt
Lecture2 oracle pptLecture2 oracle ppt
Lecture2 oracle ppt
 
Containerization Is More than the New Virtualization
Containerization Is More than the New VirtualizationContainerization Is More than the New Virtualization
Containerization Is More than the New Virtualization
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Working with Large Archives in AtoM

  • 1. Working with large archives in AtoM VICKY PHILLIPS DIGITAL STANDARDS MANAGER NATIONAL LIBRARY OF WALES
  • 2. Background  Implemented AtoM in 2015 and upgraded to version 2.4 in 2017  14,936 top level published records, 811,230 total published records  Primo (Exlibris) main discovery interface  Harvesting Dublin Core metadata from AtoM via OAI-PMH  Example record in AtoM and same record in Primo  Archives Hub will harvest our EAD metadata from AtoM via OAI-PMH
  • 3. Caching of DC & EAD XML  Caching done on clone of live system and copied across to live  128GB RAM and 8 CPUs – 6 months to cache  Increased to 26 CPUs  Single thread – Multi thread  Generate list of all records for caching  Split the list into smaller lists and spread them over most of the CPUs allocated  2-3 days
  • 4. Updating cached DC & EAD XML  Auto-caching - not an option for us  Small edit on an average size archive - 1 hr to complete  Caching archives on an individual basis  Archivists inform us when they’ve published or edited an archive  Use the slug to generate a list of all the records that form part of archive  List of records then sent for re-caching  Updates OAI which in turn will update record in Primo  Deletions
  • 5. More about the scripts  Get OAI identifier from slug php symfony nlw:get-oai-identifier --slug=daniel-protheroe-and-rhys-morgan- papers-2 732020 https://archives.library.wales/index.php/;oai?verb=GetRecord&identifier=oai:dalto n-clone.llgc.org.uk:_732020&metadataPrefix=oai_dc  Re-cache EAD and DC xml renditions using slug i.e. php symfony cache:xml-representations --slug=daniel-protheroe-and-rhys- morgan-papers-2 This is done my making slight modification to the following file lib/task/arCacheDescriptionXmlTask.class.php
  • 6.  Utility script – to generate list of all slugs that form part of an archive list_slugs_for_all_archives.py  Re-caching entire archive process ./list_slugs_for_all_archives.py > /tmp/slugs.txt then cat /tmp/slugs.txt | parallel "php symfony cache:xml-representations --slug={.}“ Or as a single command ./list_slugs_for_all_archives.py | parallel "php symfony cache:xml-representations --slug={.}"
  • 7. Loading to Primo  Total refresh of data  On-going updates to be managed via Primo pipe via OAI-PMH  Top level AtoM records dedup with corresponding record in Alma  If the item has been digitised and ingested to Fedora
  • 8. AtoM OAI-PMH Development Work Phase 2  Phase 1 – 6 institutions (National Library of Wales, University of York, Strathclyde University, The Mills Archive, University of Gloucestershire and Glasgow Caledonian University)  https://digital-archiving.blogspot.com/search/label/OAI-PMH  Phase 2 – o Expose new records at any level to the harvester o Alert the harvester to which records have been deleted
  • 9. What’s next for NLW  Embedding Universal Viewer in AtoM  https://archives.library.wales/index.php/llyfr-hugh-hughes-bardd-coch