SlideShare a Scribd company logo
1 of 13
The Reality of Digital Transfer 
@ArchivesNZ 
Ross Spencer, Talei Masters 
Archives New Zealand 
Records Management Network Event, 
Tuesday November 25 2014 
Department of Internal Affairs
Background 
Born Digital and Cultural Heritage Conference 
Melbourne*: http://bit.ly/1utAqz0 
Spencer, Braden, Hutar, Masters, Crouch, Mosely, Fly 
Away Home: Pilot Transfer of Born-digital Records at 
Archives New Zealand 
Collected our experiences from late 2013 through to early 
2014. Royal Commission work through to GDAP Closure 
and beginning of eAccessions. 
* http://playitagainproject.org/conference-report/ 
Department of Internal Affairs
A missing piece of the jigsaw… 
• An appraisal of the technical challenges 
• The first of a much bigger puzzle? 
• We understood a minimal set of descriptive 
metadata e.g. transfer metadata file; mapping 
of EDRMS fields to that schema 
• But the collection profile was missing – 
technical implications of digital preservation… 
Department of Internal Affairs
And the numbers were/are huge! 
Royal Commission on the Pike River Coal Mine Tragedy 
Two EDRMS: 
AccessData Summation Lotus Notes DMS 
374,264 Files (200GB) 
66,580 Directories 
3,892 Unidentified Objects 
15 Unidentified Extensions 
87 Known Formats 
55,425 Duplicates (Content) 
Analysis time: 108 minutes 
Department of Internal Affairs 
24,190 Files (5GB) 
641 Directories 
1,254 Unidentified Objects 
8 Unidentified Extensions 
62 Known Formats 
6,200 Duplicates (Content) 
Analysis time: 44 minutes
There’s more… 
The Canterbury Earthquakes Royal Commission (partial stats) 
One EDRMS: 
Lotus Notes DMS… (but a different flavour!) 
11,505 Files (57GB) 
246 Directories 
123 Unidentified Objects 
2 Unidentified Extensions 
55 Known Formats 
2,468 Duplicates (Content) 
Analysis time: stats not collected 
Department of Internal Affairs
Performance of tools… 
Just one (fairly profound?) example for you…Pike River 
metadata extraction, and checksum generation… ‘triage’ 
2949m21.680s 
Department of Internal Affairs 
49 Hours!
Questions already forming… 
• How do we speed things up? 
• How do we make reporting consistent? 
• Where do we begin with this information? 
• Some answers already appearing: stats report is now 
generated by a Python script in response to these 
issues: https://github.com/exponential-decay/droid-sqlite- 
analysis 
• Relies only on The National Archives, DROID tool, file 
listing, format ID, and checksumming utility 
Department of Internal Affairs
eAccession One [e1] 
Legacy accessions that we have opportunity to utilise lessons 
learned from Initial Digital Transfers… 
175 Files (166.5 mb) 
10 Directories 
0 Unidentified Objects 
0 Unidentified Extensions 
7 Known Formats 
0 Duplicates (content) 
Department of Internal Affairs
eAccession Four [e4] 
eAccessions were seen to be the least complex and allowed 
us to focus, primarily, on the challenge of ingest… 
1295 Files (565.0 mb) 
6 Directories 
2 Unidentified Objects 
1 Unidentified Extensions 
12 Known Formats 
2 Duplicates (content) 
Department of Internal Affairs 
Note: Obscured issue in original statistics… 
A number of false positives! System files 
identified as something more generic. 
Thumbnail preview files, and Serif PagePlus 
might normally look like MS Office file-like 
objects.
Technical Challenges in e1 and e4 
• [Tools] Ability to handle multi-byte character encodings. Maori macrons 
‘Ā’. 
• [Tools] Unidentified files and false positives. 
• [Tools] Recording of pre-conditioning actions on ingest into digital 
preservation system. 
• [Tools] Implementing CSV ingest mechanism; configuration, code, and 
workflow. 
• [Pre-conditioning / Tools] Digital preservation system’s ability (Rosetta) 
to handle contiguous spaces in filenames. 
• [Pre-conditioning] One invalid JPEG. Required rearrangement of 
application marker segments. 
Department of Internal Affairs
What next..? 
• One step at a time. Accessions e1 and e4; develop capability 
further with e2 and e3. 
• Incorporate metadata extraction tool JHOVE into process 
following experience with e1 and e4, possibly via FITS 
• Refine current metrics and the presentation of statistics e.g. 
make more useful for Archivists working on the born-digital 
we’re already in possession of… 
• Ideal: Archivists knowledge (processes, analysis, diagnosis) 
becomes actuated. 
Department of Internal Affairs
What next..? 
• SCALE! 
Thank you! 
Department of Internal Affairs
Department of Internal Affairs

More Related Content

Similar to The Reality of Digital Transfer @ArchivesNZ

The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to ObservabilityEmily Nakashima
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offWellcome Library
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assetsJon Morley
 
Btech IT Sem VII and VIII-1 (1).pdf
Btech IT Sem VII and VIII-1 (1).pdfBtech IT Sem VII and VIII-1 (1).pdf
Btech IT Sem VII and VIII-1 (1).pdfAdityaBhateja1
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Ricard de la Vega
 
02-History.ppt
02-History.ppt02-History.ppt
02-History.pptKashi69
 
The New DRS: Plan for Metadata Migration
The New DRS: Plan for Metadata MigrationThe New DRS: Plan for Metadata Migration
The New DRS: Plan for Metadata Migrationkevin_donovan
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Recordspbajcsy
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
VERDOODT Measuring clouds. A large scale acquisition and preservation service...
VERDOODT Measuring clouds. A large scale acquisition and preservation service...VERDOODT Measuring clouds. A large scale acquisition and preservation service...
VERDOODT Measuring clouds. A large scale acquisition and preservation service...FIAT/IFTA
 
AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfekobelasting
 
NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund
NDC Oslo 2019 - War stories from .NET team -- Karel ZikmundNDC Oslo 2019 - War stories from .NET team -- Karel Zikmund
NDC Oslo 2019 - War stories from .NET team -- Karel ZikmundKarel Zikmund
 
MyersTessella_Dec2013
MyersTessella_Dec2013MyersTessella_Dec2013
MyersTessella_Dec2013Mark Myers
 
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel ZikmundKarel Zikmund
 
LoCloud Collections Introduction
LoCloud Collections IntroductionLoCloud Collections Introduction
LoCloud Collections Introductionlocloud
 
G01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsG01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsevaminerva
 
G01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsG01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsevaminerva
 

Similar to The Reality of Digital Transfer @ArchivesNZ (20)

The Incremental Path to Observability
The Incremental Path to ObservabilityThe Incremental Path to Observability
The Incremental Path to Observability
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling off
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
Btech IT Sem VII and VIII-1 (1).pdf
Btech IT Sem VII and VIII-1 (1).pdfBtech IT Sem VII and VIII-1 (1).pdf
Btech IT Sem VII and VIII-1 (1).pdf
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
02-History.ppt
02-History.ppt02-History.ppt
02-History.ppt
 
The New DRS: Plan for Metadata Migration
The New DRS: Plan for Metadata MigrationThe New DRS: Plan for Metadata Migration
The New DRS: Plan for Metadata Migration
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
VERDOODT Measuring clouds. A large scale acquisition and preservation service...
VERDOODT Measuring clouds. A large scale acquisition and preservation service...VERDOODT Measuring clouds. A large scale acquisition and preservation service...
VERDOODT Measuring clouds. A large scale acquisition and preservation service...
 
AntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdfAntiForensics - Leveraging OS and File System Artifacts.pdf
AntiForensics - Leveraging OS and File System Artifacts.pdf
 
NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund
NDC Oslo 2019 - War stories from .NET team -- Karel ZikmundNDC Oslo 2019 - War stories from .NET team -- Karel Zikmund
NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund
 
MyersTessella_Dec2013
MyersTessella_Dec2013MyersTessella_Dec2013
MyersTessella_Dec2013
 
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund
.NET Core Summer event 2019 in NL - War stories from .NET team -- Karel Zikmund
 
LoCloud Collections Introduction
LoCloud Collections IntroductionLoCloud Collections Introduction
LoCloud Collections Introduction
 
G01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsG01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collections
 
G01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collectionsG01 blazek betanski_locloud_collections
G01 blazek betanski_locloud_collections
 

Recently uploaded

(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28JSchaus & Associates
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...Suhani Kapoor
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxaaryamanorathofficia
 
DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024Energy for One World
 
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...aartirawatdelhi
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at workChristina Parmionova
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceHigh Profile Call Girls
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024ARCResearch
 
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Roomishabajaj13
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTaccounts329278
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...ResolutionFoundation
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersCongressional Budget Office
 
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...ranjana rawat
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfahcitycouncil
 

Recently uploaded (20)

(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
(PRIYA) Call Girls Rajgurunagar ( 7001035870 ) HI-Fi Pune Escorts Service
 
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow ₹7.5k Pick Up & Drop With Cash Payment 8...
 
2024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 282024: The FAR, Federal Acquisition Regulations - Part 28
2024: The FAR, Federal Acquisition Regulations - Part 28
 
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
VIP High Profile Call Girls Gorakhpur Aarushi 8250192130 Independent Escort S...
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptx
 
DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024DNV publication: China Energy Transition Outlook 2024
DNV publication: China Energy Transition Outlook 2024
 
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(NEHA) Bhosari Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...Night 7k to 12k  Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
Night 7k to 12k Call Girls Service In Navi Mumbai 👉 BOOK NOW 9833363713 👈 ♀️...
 
Climate change and safety and health at work
Climate change and safety and health at workClimate change and safety and health at work
Climate change and safety and health at work
 
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile ServiceCunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
Cunningham Road Call Girls Bangalore WhatsApp 8250192130 High Profile Service
 
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
(TARA) Call Girls Chakan ( 7001035870 ) HI-Fi Pune Escorts Service
 
Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024Regional Snapshot Atlanta Aging Trends 2024
Regional Snapshot Atlanta Aging Trends 2024
 
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
(SHINA) Call Girls Khed ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With RoomVIP Kolkata Call Girl Jatin Das Park 👉 8250192130  Available With Room
VIP Kolkata Call Girl Jatin Das Park 👉 8250192130 Available With Room
 
Fair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CTFair Trash Reduction - West Hartford, CT
Fair Trash Reduction - West Hartford, CT
 
Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...Precarious profits? Why firms use insecure contracts, and what would change t...
Precarious profits? Why firms use insecure contracts, and what would change t...
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists Lawmakers
 
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
 
How to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the ThreatHow to Save a Place: 12 Tips To Research & Know the Threat
How to Save a Place: 12 Tips To Research & Know the Threat
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdf
 

The Reality of Digital Transfer @ArchivesNZ

  • 1. The Reality of Digital Transfer @ArchivesNZ Ross Spencer, Talei Masters Archives New Zealand Records Management Network Event, Tuesday November 25 2014 Department of Internal Affairs
  • 2. Background Born Digital and Cultural Heritage Conference Melbourne*: http://bit.ly/1utAqz0 Spencer, Braden, Hutar, Masters, Crouch, Mosely, Fly Away Home: Pilot Transfer of Born-digital Records at Archives New Zealand Collected our experiences from late 2013 through to early 2014. Royal Commission work through to GDAP Closure and beginning of eAccessions. * http://playitagainproject.org/conference-report/ Department of Internal Affairs
  • 3. A missing piece of the jigsaw… • An appraisal of the technical challenges • The first of a much bigger puzzle? • We understood a minimal set of descriptive metadata e.g. transfer metadata file; mapping of EDRMS fields to that schema • But the collection profile was missing – technical implications of digital preservation… Department of Internal Affairs
  • 4. And the numbers were/are huge! Royal Commission on the Pike River Coal Mine Tragedy Two EDRMS: AccessData Summation Lotus Notes DMS 374,264 Files (200GB) 66,580 Directories 3,892 Unidentified Objects 15 Unidentified Extensions 87 Known Formats 55,425 Duplicates (Content) Analysis time: 108 minutes Department of Internal Affairs 24,190 Files (5GB) 641 Directories 1,254 Unidentified Objects 8 Unidentified Extensions 62 Known Formats 6,200 Duplicates (Content) Analysis time: 44 minutes
  • 5. There’s more… The Canterbury Earthquakes Royal Commission (partial stats) One EDRMS: Lotus Notes DMS… (but a different flavour!) 11,505 Files (57GB) 246 Directories 123 Unidentified Objects 2 Unidentified Extensions 55 Known Formats 2,468 Duplicates (Content) Analysis time: stats not collected Department of Internal Affairs
  • 6. Performance of tools… Just one (fairly profound?) example for you…Pike River metadata extraction, and checksum generation… ‘triage’ 2949m21.680s Department of Internal Affairs 49 Hours!
  • 7. Questions already forming… • How do we speed things up? • How do we make reporting consistent? • Where do we begin with this information? • Some answers already appearing: stats report is now generated by a Python script in response to these issues: https://github.com/exponential-decay/droid-sqlite- analysis • Relies only on The National Archives, DROID tool, file listing, format ID, and checksumming utility Department of Internal Affairs
  • 8. eAccession One [e1] Legacy accessions that we have opportunity to utilise lessons learned from Initial Digital Transfers… 175 Files (166.5 mb) 10 Directories 0 Unidentified Objects 0 Unidentified Extensions 7 Known Formats 0 Duplicates (content) Department of Internal Affairs
  • 9. eAccession Four [e4] eAccessions were seen to be the least complex and allowed us to focus, primarily, on the challenge of ingest… 1295 Files (565.0 mb) 6 Directories 2 Unidentified Objects 1 Unidentified Extensions 12 Known Formats 2 Duplicates (content) Department of Internal Affairs Note: Obscured issue in original statistics… A number of false positives! System files identified as something more generic. Thumbnail preview files, and Serif PagePlus might normally look like MS Office file-like objects.
  • 10. Technical Challenges in e1 and e4 • [Tools] Ability to handle multi-byte character encodings. Maori macrons ‘Ā’. • [Tools] Unidentified files and false positives. • [Tools] Recording of pre-conditioning actions on ingest into digital preservation system. • [Tools] Implementing CSV ingest mechanism; configuration, code, and workflow. • [Pre-conditioning / Tools] Digital preservation system’s ability (Rosetta) to handle contiguous spaces in filenames. • [Pre-conditioning] One invalid JPEG. Required rearrangement of application marker segments. Department of Internal Affairs
  • 11. What next..? • One step at a time. Accessions e1 and e4; develop capability further with e2 and e3. • Incorporate metadata extraction tool JHOVE into process following experience with e1 and e4, possibly via FITS • Refine current metrics and the presentation of statistics e.g. make more useful for Archivists working on the born-digital we’re already in possession of… • Ideal: Archivists knowledge (processes, analysis, diagnosis) becomes actuated. Department of Internal Affairs
  • 12. What next..? • SCALE! Thank you! Department of Internal Affairs

Editor's Notes

  1. ** Stats generated by a prototype analysis tool in concert with The National Archives DROID tool – work to do to improve further ** Temptation to lump both EDRMS together to look at as an individual accession, but this masks a separate issue Extract of files, and metadata, and mapping of that metadata from two different systems is a challenge in itself… These numbers come from the initial transfers project/Government Digital Archive Project (GDAP) which was closed down. Files not ingested. Files remain in custody of DIA Records Team
  2. ** Stats generated by a prototype analysis tool in concert with The National Archives DROID tool – work to do to improve further ** These numbers come from the initial transfers project/Government Digital Archive Project (GDAP) which was closed down. Files not ingested. Files remain in custody of DIA Records Team
  3. Just the beginning of the data we need to collect A triage dataset to improve decision making JHOVE/Tika ID/File/SHA1SUM Further analysis needed on the analysis!
  4. At this point we’re already seeing the direction we need to take things… Reporting script an output of these questions. Improving consistency / repeatability etc. Reporting script available from GitHub and DROID available from The National Archives, UK website The output of DROID can be drilled into to understand collections, e.g. number of duplicates found across different sets of folders Open source. Useful to agencies embarking on migration project. Collection profiling.
  5. Following the closure of GDAP the Digital Continuity team started work on legacy accessions we were in the possession of. eAccessions. Smaller and less complex. Still enough challenges to push our knowledge forward.
  6. Thumbs.db identified as their family file format – OLE2. Masked their true essence. Serif PagePlus also…
  7. Tool support for Unicode was found lacking through the process. Excel, DROID, our digital preservation system, our own Python script during initial prototyping. Correspondence with developers of DROID to improve tool, to get it to support Unicode. We used pre-release versions of DROID (6.1.4) (as testers) for much of our testing. Work required (collaboration, BL, TNA, SRNSW) to incorporate new identification mechanisms in DROID tool. Pre-conditioning required on JPEG to provide adequate provenance trail on ingest.
  8. Analogy: A medical doctor isn’t always referring to their books. Their knowledge is inbuilt, and instinct. Example: False positives in format ID. Format recognised, but nonsense in context. Example: Duplicates, knowledge of workflow in tools useful in making decisions. Some CERC duplicates came from repetition of email footer images across collection. Whose problem does this become? Agencies (re: management)? Ours (re: storage optimisation (store just one and link?))? A slight distraction. Management of this type of issue up-front is desirable in helping to reduce pain during technical appraisal / ingest.
  9. Simple! We can do this on smaller accessions… It just needs to be scaled!!! ;) How hard can that be?!