SlideShare a Scribd company logo
Data Migrations
Some Considerations when Preparing to
Migrate to AtoM
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
Overview• Assess what data will be part of the migration
• Do any in-system clean up prior to export
• Review AtoM data formats and available fields
• Establish a crosswalk between your data and AtoM’s fields
• Export your data
• Perform any additional clean up as needed
• Transform your data to an AtoM compatible import format
• Import
• Review your work
• Revise and reimport if needed
• Make small clean up edits in AtoM directly
Expect this to take time
• Our average length for a client data migration project is
around 4-6 months. Even for a simple project, there will
be a lot of time needed for data clean-up, quality
assurance review, and reimports.
Expect to do your import more than once
• It’s unlikely that everything will go perfectly on the
first attempt. You’ll discover some records don’t quite
match the same pattern as the rest, or one field didn’t
import, etc. Don’t be discouraged, and do budget your
time with this assumption in mind.
Before Starting
Develop a data management plan while you migrate
• How will you ensure you are not stranding data during the
time of your migration? Will you freeze data entry
entirely for the length? Manage your data in a
spreadsheet? Run a small migration for new data at the
end? Make sure everyone knows the plan.
Clarify roles, deadlines, and communication channels
• Ensure everyone involved knows what is expected of them
throughout the project, and when. Clearly identify those
responsible for key roles, and where to go for support.
Before Starting
Data assessment
• How many descriptions do you have? How many top-level
records?
• Have the records been described based on any content
standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?)
• Are there custom fields with data in your system? How many?
Do they readily map to known standards or not?
• What export formats does your system support?
• Is all record data captured in the exports?
• Are some descriptions “draft” or non-public? Is this
information captured in the export?
Questions to ask in the data assessment phase:
Data assessment
• How many digital objects do you have to migrate? What types
(images, text, video, etc) and formats (e.g. JPG, mp4, etc)
are represented?
• Are authority records maintained separately from
descriptions? What about other entities? Accession records?
• Is the relationship between these entities and descriptions
captured in the export formats available?
• Do these other record types have their own export formats?
(e.g. EAC-CPF XML, SKOS XML, CSV, etc)
• How are hierarchical relationships captured in the export?
Questions to ask in the data assessment phase:
AtoM Data FormatsArchival descriptions
• CSV, EAD 2002 XML, MODS XML
Authority records
• EAC-CPF XML, CSV
Accessions
• CSV
Terms (Subjects, Places, Genres, etc.)
• SKOS - many serializations supported
Repository records
• CSV
Current as of
version 2.4
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
CSV import will be the best way to get data into AtoM -
because the CSV import template is a format specific to
AtoM, there is no data loss and all fields are represented.
If you are able to get your data out of your legacy system
and transform it into an AtoM-compatible CSV format, we
recommend using this method for your migration project.
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
In the example CSV files from v2.2 on, we have
included the relevant content standard name and number
in the sample data field. This means you can import
the CSV template to produce a sort of “crosswalk” or
key, showing you how fields in AtoM map to the column
headers.
AtoM CSV Templates
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
AtoM EAD 2002 XML
EAD 2002 XML is a flexible standard with many possible valid
but different implementations. For this reason, your locally
generated EAD, while valid, may still not import perfectly
into AtoM. This is why we prefer working with CSV imports
whenever possible.
We recommend running a test import of a representative sample
from your source system into AtoM, and using the crosswalk
method discussed above to evaluate if you will need to make
changes to how your EAD XML is encoded for a successful import
into AtoM.
Crosswalking
Crosswalking is the process of mapping your source
data fields to equivalent AtoM ones.
To do so, you must understand how AtoM handles some
data (such as authority records, terms, etc.) first.
There will be cases where there are no 1:1
equivalencies either - you will have to make decisions
about how to combine or split apart your existing data
to make it work with what is available.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
Crosswalking
AtoM is standards-based.
This means you can focus on crosswalking to the
content standard you know best. Use the guidance
provided in the relevant standard to help inform your
mapping.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
AtoM Entity Types
A(n incomplete) list
of the main entity
types around which
AtoM was built.
https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
Accessions
Accession records have their own CSV import format. As
there is currently no international accessions standard,
you will need to review the available fields in AtoM
closely and determine where to map your data.
Descriptions can be linked to Accessions via the
accessionNumber column in the description CSV templates.
We recommend importing your Accessions first, then your
descriptions with the corresponding accession number, to
establish links.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#import-accessions-via-csv
ActorsIn AtoM, creators and name access points are maintained
separately as authority records, so they can be re-used and
linked to multiple descriptions.
This means any creator name or name access point you import
with your descriptions will create an authority record, or
link to an existing match!
Make sure that names are consistent in your data, and the
biographical/administrative history is about the actor only -
not specific to the description.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority-
records/#authority-bioghist-access
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on-
authority-records-archival-descriptions-and-csv-imports
ActorsThe Actor data you can add to a description CSV is minimal -
if you do maintain authority records, then you may want to
import them separately via AtoM’s authority record CSV
templates.
There are 3 actor CSV templates - the main actors template, 1
to supplement relationship data (between actors and/or
resources) and 1 to supplement alternative forms of name.
We recommend importing authority records before descriptions,
so you can link them on description import.
See:
• https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator-
related-import-columns-actors-and-events
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import-
authority-records-via-csv
Event Dates
Description edit templates have 3 date fields. The
Display date is what the end user will see - it is
free text. The start and end dates must follow ISO
8601 (YYYY-MM-DD, etc) formatting. These fields are
used to support AtoM’s date range search.
• Display date
• Start date
• End date
Event DatesDuring CSV import, Creators and Dates are paired (as
Events - see Entity types diagram).
Use the | pipe character to add multiple creators/dates.
You can use a literal NULL value in your CSV file to
keep the spacing correct for dates without actors or
vice versa:
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#creator-related-import-columns-actors-and-events
Access PointsIn AtoM, access points on a description (e.g. subjects,
places, genre terms) are maintained separately as terms in a
taxonomy so they can be controlled and reused.
This means that access point data in your description imports
will either create new terms or link to existing ones. Make
sure your data is consistent so you don’t have near-
duplicates later! (e.g. “cars” vs “car” vs “automobiles”)
The exception is name access points - these are authority
records!
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-
content/terms/#term-name-vs-subject
Hierarchies
Hierarchies are managed in the description CSV
templates via the legacyId and parentId columns.
Parent records must import in a row above child
records. The children should have the legacyId value
of the parent record in the parentId column.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#hierarchical-relationships
Can be imported with descriptions using the
digitalObjectURI or digitalObjectPath columns.
URIs point to external, web-accessible resources -
must end in file extension!
Paths point to a local directory added to your
server prior to import.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#digital-object-related-import-columns
Digital Objects
You have 3 main options when it comes to
transforming your data into an AtoM-
compatible format:
• Manual data transformation
• Tools such as OpenRefine
• Transformation script
http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly
Data Transformation
OpenRefine is “a free, open source
power tool for working with messy
data and improving it.”
See:
• http://openrefine.org/
• https://github.com/OpenRefine/OpenRefine
There are many great free resources to help
you get started.
Data Transformation
Use OpenRefine to:
• Add AtoM column headers
• Normalize names and terms
• Standardize identifiers or accession
numbers
• Split source data into separate columns
• Combine data into a single column
• Delete unnecessary rows
• Global search/replace
• etc
You can also use OpenRefine to clean up XML data!
Data Transformation
A transformation script
is generally a script prepared
by a developer that takes an
input (your source data) and
runs a series of operations to
transform the data into the
desired output (an AtoM-
compatible file).
These can be prepared in many
programming languages (e.g.
PHP, Python, etc).
Data Transformation
Import Ordering
If you are working with
several different types of
data, you may need to
perform multiple imports,
possibly in different
formats. If so, we
recommend proceeding in
this order to link entities
together as your imports
proceed.
We also recommend running a
smaller sample test first!
1. Terms
2. Repositories
3. Actors
4. Accessions
5. Descriptions
Questions?
info@artefactual.com
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html

More Related Content

What's hot

What's hot (20)

Searching in AtoM
Searching in AtoMSearching in AtoM
Searching in AtoM
 
Constructing SQL queries for AtoM
Constructing SQL queries for AtoMConstructing SQL queries for AtoM
Constructing SQL queries for AtoM
 
Designing High Availability for HashiCorp Vault in AWS
Designing High Availability for HashiCorp Vault in AWSDesigning High Availability for HashiCorp Vault in AWS
Designing High Availability for HashiCorp Vault in AWS
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual Systems
 
AtoM feature development
AtoM feature developmentAtoM feature development
AtoM feature development
 
Alfresco 5.2 REST API
Alfresco 5.2 REST APIAlfresco 5.2 REST API
Alfresco 5.2 REST API
 
A Practical Introduction to Apache Solr
A Practical Introduction to Apache SolrA Practical Introduction to Apache Solr
A Practical Introduction to Apache Solr
 
Digital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An IntroductionDigital Preservation with Archivematica: An Introduction
Digital Preservation with Archivematica: An Introduction
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
Desarrollando software para Alfresco (keensoft)
Desarrollando software para Alfresco (keensoft)Desarrollando software para Alfresco (keensoft)
Desarrollando software para Alfresco (keensoft)
 
Infrastructure Deployment with Docker & Ansible
Infrastructure Deployment with Docker & AnsibleInfrastructure Deployment with Docker & Ansible
Infrastructure Deployment with Docker & Ansible
 
Spring integration을 통해_살펴본_메시징_세계
Spring integration을 통해_살펴본_메시징_세계Spring integration을 통해_살펴본_메시징_세계
Spring integration을 통해_살펴본_메시징_세계
 
Basic principles of Git
Basic principles of GitBasic principles of Git
Basic principles of Git
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API
 
The Power of CSS Flexbox
The Power of CSS FlexboxThe Power of CSS Flexbox
The Power of CSS Flexbox
 
Aem dispatcher – tips & tricks
Aem dispatcher – tips & tricksAem dispatcher – tips & tricks
Aem dispatcher – tips & tricks
 
Collaborative Editing Tools for Alfresco
Collaborative Editing Tools for AlfrescoCollaborative Editing Tools for Alfresco
Collaborative Editing Tools for Alfresco
 
Angular performance slides
Angular performance slidesAngular performance slides
Angular performance slides
 
AEM Meetup Personalization with ContextHub
AEM Meetup Personalization with ContextHubAEM Meetup Personalization with ContextHub
AEM Meetup Personalization with ContextHub
 
第一次Elasticsearch就上手
第一次Elasticsearch就上手第一次Elasticsearch就上手
第一次Elasticsearch就上手
 

Similar to AtoM Data Migrations

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh Patel
 

Similar to AtoM Data Migrations (20)

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
CADA
CADA CADA
CADA
 
SAS - Training
SAS - Training SAS - Training
SAS - Training
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 

More from Artefactual Systems - AtoM

More from Artefactual Systems - AtoM (16)

AtoM Community Update: 2019-05
AtoM Community Update: 2019-05AtoM Community Update: 2019-05
AtoM Community Update: 2019-05
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with Vagrant
 
Looking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and futureLooking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and future
 
Contributing to the AtoM documentation
Contributing to the AtoM documentationContributing to the AtoM documentation
Contributing to the AtoM documentation
 
Installing AtoM with Ansible
Installing AtoM with AnsibleInstalling AtoM with Ansible
Installing AtoM with Ansible
 
Installing and Upgrading AtoM
Installing and Upgrading AtoMInstalling and Upgrading AtoM
Installing and Upgrading AtoM
 
Command-Line 101
Command-Line 101Command-Line 101
Command-Line 101
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshop
 
Introducing Access to Memory
Introducing Access to MemoryIntroducing Access to Memory
Introducing Access to Memory
 
Artefactual and Open Source Development
Artefactual and Open Source DevelopmentArtefactual and Open Source Development
Artefactual and Open Source Development
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
AtoM Community Update 2016
AtoM Community Update 2016AtoM Community Update 2016
AtoM Community Update 2016
 
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
 
Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015
 
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
 
Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 

AtoM Data Migrations

  • 1. Data Migrations Some Considerations when Preparing to Migrate to AtoM http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
  • 2. Overview• Assess what data will be part of the migration • Do any in-system clean up prior to export • Review AtoM data formats and available fields • Establish a crosswalk between your data and AtoM’s fields • Export your data • Perform any additional clean up as needed • Transform your data to an AtoM compatible import format • Import • Review your work • Revise and reimport if needed • Make small clean up edits in AtoM directly
  • 3. Expect this to take time • Our average length for a client data migration project is around 4-6 months. Even for a simple project, there will be a lot of time needed for data clean-up, quality assurance review, and reimports. Expect to do your import more than once • It’s unlikely that everything will go perfectly on the first attempt. You’ll discover some records don’t quite match the same pattern as the rest, or one field didn’t import, etc. Don’t be discouraged, and do budget your time with this assumption in mind. Before Starting
  • 4. Develop a data management plan while you migrate • How will you ensure you are not stranding data during the time of your migration? Will you freeze data entry entirely for the length? Manage your data in a spreadsheet? Run a small migration for new data at the end? Make sure everyone knows the plan. Clarify roles, deadlines, and communication channels • Ensure everyone involved knows what is expected of them throughout the project, and when. Clearly identify those responsible for key roles, and where to go for support. Before Starting
  • 5. Data assessment • How many descriptions do you have? How many top-level records? • Have the records been described based on any content standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?) • Are there custom fields with data in your system? How many? Do they readily map to known standards or not? • What export formats does your system support? • Is all record data captured in the exports? • Are some descriptions “draft” or non-public? Is this information captured in the export? Questions to ask in the data assessment phase:
  • 6. Data assessment • How many digital objects do you have to migrate? What types (images, text, video, etc) and formats (e.g. JPG, mp4, etc) are represented? • Are authority records maintained separately from descriptions? What about other entities? Accession records? • Is the relationship between these entities and descriptions captured in the export formats available? • Do these other record types have their own export formats? (e.g. EAC-CPF XML, SKOS XML, CSV, etc) • How are hierarchical relationships captured in the export? Questions to ask in the data assessment phase:
  • 7. AtoM Data FormatsArchival descriptions • CSV, EAD 2002 XML, MODS XML Authority records • EAC-CPF XML, CSV Accessions • CSV Terms (Subjects, Places, Genres, etc.) • SKOS - many serializations supported Repository records • CSV Current as of version 2.4
  • 9. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates CSV import will be the best way to get data into AtoM - because the CSV import template is a format specific to AtoM, there is no data loss and all fields are represented. If you are able to get your data out of your legacy system and transform it into an AtoM-compatible CSV format, we recommend using this method for your migration project.
  • 10. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates In the example CSV files from v2.2 on, we have included the relevant content standard name and number in the sample data field. This means you can import the CSV template to produce a sort of “crosswalk” or key, showing you how fields in AtoM map to the column headers.
  • 12. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 13. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 15. AtoM EAD 2002 XML EAD 2002 XML is a flexible standard with many possible valid but different implementations. For this reason, your locally generated EAD, while valid, may still not import perfectly into AtoM. This is why we prefer working with CSV imports whenever possible. We recommend running a test import of a representative sample from your source system into AtoM, and using the crosswalk method discussed above to evaluate if you will need to make changes to how your EAD XML is encoded for a successful import into AtoM.
  • 16. Crosswalking Crosswalking is the process of mapping your source data fields to equivalent AtoM ones. To do so, you must understand how AtoM handles some data (such as authority records, terms, etc.) first. There will be cases where there are no 1:1 equivalencies either - you will have to make decisions about how to combine or split apart your existing data to make it work with what is available. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 17. Crosswalking AtoM is standards-based. This means you can focus on crosswalking to the content standard you know best. Use the guidance provided in the relevant standard to help inform your mapping. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 18. AtoM Entity Types A(n incomplete) list of the main entity types around which AtoM was built. https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
  • 19. Accessions Accession records have their own CSV import format. As there is currently no international accessions standard, you will need to review the available fields in AtoM closely and determine where to map your data. Descriptions can be linked to Accessions via the accessionNumber column in the description CSV templates. We recommend importing your Accessions first, then your descriptions with the corresponding accession number, to establish links. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#import-accessions-via-csv
  • 20. ActorsIn AtoM, creators and name access points are maintained separately as authority records, so they can be re-used and linked to multiple descriptions. This means any creator name or name access point you import with your descriptions will create an authority record, or link to an existing match! Make sure that names are consistent in your data, and the biographical/administrative history is about the actor only - not specific to the description. See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority- records/#authority-bioghist-access • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on- authority-records-archival-descriptions-and-csv-imports
  • 21. ActorsThe Actor data you can add to a description CSV is minimal - if you do maintain authority records, then you may want to import them separately via AtoM’s authority record CSV templates. There are 3 actor CSV templates - the main actors template, 1 to supplement relationship data (between actors and/or resources) and 1 to supplement alternative forms of name. We recommend importing authority records before descriptions, so you can link them on description import. See: • https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator- related-import-columns-actors-and-events • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import- authority-records-via-csv
  • 22. Event Dates Description edit templates have 3 date fields. The Display date is what the end user will see - it is free text. The start and end dates must follow ISO 8601 (YYYY-MM-DD, etc) formatting. These fields are used to support AtoM’s date range search. • Display date • Start date • End date
  • 23. Event DatesDuring CSV import, Creators and Dates are paired (as Events - see Entity types diagram). Use the | pipe character to add multiple creators/dates. You can use a literal NULL value in your CSV file to keep the spacing correct for dates without actors or vice versa: See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#creator-related-import-columns-actors-and-events
  • 24. Access PointsIn AtoM, access points on a description (e.g. subjects, places, genre terms) are maintained separately as terms in a taxonomy so they can be controlled and reused. This means that access point data in your description imports will either create new terms or link to existing ones. Make sure your data is consistent so you don’t have near- duplicates later! (e.g. “cars” vs “car” vs “automobiles”) The exception is name access points - these are authority records! See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit- content/terms/#term-name-vs-subject
  • 25. Hierarchies Hierarchies are managed in the description CSV templates via the legacyId and parentId columns. Parent records must import in a row above child records. The children should have the legacyId value of the parent record in the parentId column. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#hierarchical-relationships
  • 26. Can be imported with descriptions using the digitalObjectURI or digitalObjectPath columns. URIs point to external, web-accessible resources - must end in file extension! Paths point to a local directory added to your server prior to import. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#digital-object-related-import-columns Digital Objects
  • 27. You have 3 main options when it comes to transforming your data into an AtoM- compatible format: • Manual data transformation • Tools such as OpenRefine • Transformation script http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly Data Transformation
  • 28. OpenRefine is “a free, open source power tool for working with messy data and improving it.” See: • http://openrefine.org/ • https://github.com/OpenRefine/OpenRefine There are many great free resources to help you get started. Data Transformation
  • 29. Use OpenRefine to: • Add AtoM column headers • Normalize names and terms • Standardize identifiers or accession numbers • Split source data into separate columns • Combine data into a single column • Delete unnecessary rows • Global search/replace • etc You can also use OpenRefine to clean up XML data! Data Transformation
  • 30. A transformation script is generally a script prepared by a developer that takes an input (your source data) and runs a series of operations to transform the data into the desired output (an AtoM- compatible file). These can be prepared in many programming languages (e.g. PHP, Python, etc). Data Transformation
  • 31. Import Ordering If you are working with several different types of data, you may need to perform multiple imports, possibly in different formats. If so, we recommend proceeding in this order to link entities together as your imports proceed. We also recommend running a smaller sample test first! 1. Terms 2. Repositories 3. Actors 4. Accessions 5. Descriptions