Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Digital
Preservation
Digitization Basics for Archives and Special
Collections - Part 2: Store and Share
WiLSWorld 2015
SD
JH
CP
UW Digital Collections Center
Steven Dast
Digital Asset Librarian
Jesse Henderson
Digital Services Librarian
Cat ...
“We'll examine
the issue of
digital
preservation . . . ”
“. . . including practical
steps you can take to
preserve your digital
content with limited
resources.”
Characteristics of digital information
Strengths
● Easy to make and transmit
perfect copies
● Machine readable content and...
Two primary stages in digital lifecycle
Creation stage
● Intense, focused
action
● Maximize value of
digital material
● Ri...
Strategies for digital preservation
Take advantage of our strengths:
● Make lots of copies in different places
● Automate ...
Strategies for both phases
Use broadly supported standard file format that
store uncompressed data:
TIFF for images
WAV fo...
Strategies for both phases
Work as consistently as possible; keep good
records; document special cases
● Reduce cost of fu...
Strategies for both phases
Use file naming system that is simple and
consistent, but flexible.
Remember: whatever system y...
Strategies — File naming
Avoid spaces and special characters (/  : * ? “)
Use letters and numbers, underscore ( _ ),
hyphe...
Strategies — File naming
Effects and side effects of file names
● Identity
● Order / sequencing (0122.tif 0123.tif 0124.ti...
Strategies — File naming
Using meaningful file names can
● Facilitate error detection and recovery
o missing or misplaced ...
Strategies — File naming
Also use directories to help organize files
● Same naming conventions apply (avoid . )
● Same nam...
Strategies — File naming
UWDCC naming system for books:
One directory per volume, with flexible four-digit
sequential file...
Strategies — File naming
UWDCC naming system for photographs:
Short alpha pre-fix with a flexible serial number,
ad hoc sy...
Strategies — File naming
Bottom line:
If you have technical requirements for file
names, follow them.
Beyond that, choose ...
Strategies for creation phase
Create high-quality digital surrogates sufficient
to meet current and anticipated needs
● En...
Strategies for creation phase
Create backups of current work and maintain
fall-back positions in case corrections are
need...
Strategies for creation phase
Check your work at major transitions, not just
for quality issues, but also for completeness...
Strategies for preservation phase
Choose storage media that best match your
resources and requirements.
● Make multiple co...
Strategies — Storage media
Technology Size Stability Cost
Flash storage 4 – 256 GB 5-20 years or less $0.50/GB
Hard drive
...
Strategies — Storage media
Over its history, UWDCC has used
● JAZ disks
● Duplicate CD-R
● Duplicate data tapes
● Hard dri...
Strategies — Storage media
Recommended options for getting started
● CD-R or DVD-R/+R
o Use the good stuff: MAM-A Gold Arc...
Strategies — Storage media
Avoid
● Flash drives — too unstable
● Reliance on the Cloud as your only archive
Strategies for preservation phase
Anytime you move data to a new medium or a
new physical device, verify!
(Now that you’re...
Strategies for preservation phase
Create checksums for each file that you archive
● Use now to verify files on transfer
● ...
Strategies for preservation phase
Keep track (metadata!) of where your files are
archived
● Material that can’t be located...
UWDCC workflow
1. Metadata first: checklist for subsequent work
2-5. Working files organized under three
directories: orig...
UWDCC workflow
7. ‘Click-through’ all images in test mode
7a. Once all is correct: public release!
8a. Recheck files again...
UWDCC Tools
● Microsoft Excel or FileMaker Pro for metadata entry
(sometimes Access)
● Variety of scanners chosen to maxim...
Other tool options
Image editing:
GIMP (Windows, Mac, Linux)
Paint.net (Windows)
Automation:
VBScript, JScript, VBA (Windo...
Summary
Both Phases Creation Phase Preservation Phase
★ Use broadly supported
standard file formats
(tiff, wav)
★ Develop ...
Selected references and reading
General DP
http://digitalpowrr.niu.edu/wp-content/uploads/2014/05/Overwhelmed-to-
action.r...
Selected tools and resources
Scanning
http://www.hamrick.com (Vuescan)
http://www.imagescienceassociates.com/
(GoldenThrea...
Questions?
UWDC & Digital as Preservation
The UWDCC recently launched a pilot project in collaboration with our Preservation Departme...
UWDC & Digital as Preservation
Type Hardware Software
High Speed scanning Panasonic KV-S3065C High
Speed Color Scanner
Rel...
UWDC & Digital as Preservation
The basics:
● What is Preservation? - Extending the useful life of our stuff.
● Why do we d...
UWDC & Digital as Preservation
Prep:
1. Identify
What do we have that needs preserving? Where did it come from?
2. Evaluat...
UWDC & Digital as Preservation
What did this look like at UWDC?
● Researched current literature - focus on FADGI.
● Establ...
UWDC & Digital as Preservation
FADGI = whoa…Lots to digest! Our takeaways:
Evaluate and Assess our digitization environmen...
UWDC & Digital as Preservation
Using GoldenThread
● Flatbeds and Epson Scan software - customizing the color balance
setti...
UWDC & Digital as Preservation
Using targets and software to determine
performance
3s: +/- 6 aim points
4s: +/- 3 aim poin...
UWDC & Digital as Preservation
Established baseline, optimum performance.
Establish maintenance schedule.
UWDC & Digital as Preservation
Monthly: Check BetterLight and Flatbed performance against baseline
performance with Golden...
UWDC & Digital as Preservation
Access recipe:
● 300 dpi
● 24-bit color or (grayscale on our high speed scanner)
● Flatbed,...
UWDC & Digital as Preservation
Preservation recipe:
● 400 dpi
● 24-bit color
● BetterLight only (for now)
● Custom tone cu...
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Digitization Basics for Archives and Special Collections – Part 2: Store and Share
Upcoming SlideShare
Loading in …5
×

Digitization Basics for Archives and Special Collections – Part 2: Store and Share

1,040 views

Published on

Catherine Phan, Metadata Librarian, University of Wisconsin Digital Collections Center
Jesse Henderson, Digital Services Librarian, University of Wisconsin Digital Collections Center
Steven Dast, Digital Asset Librarian, University of Wisconsin Digital Collections Center

This is the second part of a two-part, full-day workshop introducing the core elements of creating digital collections of historic photographs, documents and other archival materials. Part 2 focuses on sharing your digitized materials with the world and steps you can take to ensure that they’ll remain usable and accessible into the future. We’ll define metadata and why it’s important, and consider approaches to creating descriptive metadata for discovery of historical resources. We’ll examine the issue of digital preservation, including practical steps you can take to preserve your digital content with limited resources. And we’ll think about digitization as a path to community engagement, including reaching out to your community for content and promoting your digital collections to your users.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Digitization Basics for Archives and Special Collections – Part 2: Store and Share

  1. 1. Digital Preservation Digitization Basics for Archives and Special Collections - Part 2: Store and Share WiLSWorld 2015
  2. 2. SD JH CP UW Digital Collections Center Steven Dast Digital Asset Librarian Jesse Henderson Digital Services Librarian Cat Phan Digital Services Librarian
  3. 3. “We'll examine the issue of digital preservation . . . ”
  4. 4. “. . . including practical steps you can take to preserve your digital content with limited resources.”
  5. 5. Characteristics of digital information Strengths ● Easy to make and transmit perfect copies ● Machine readable content and metadata facilitates automation ● Storage relatively inexpensive and becoming more so Challenges ● Fragile, easily malleable ● Storage media not durable ● High density of storage ● Requires technology to render into human readable form o Obsolescence o Early signs of loss may not be apparent o Loss generally extensive
  6. 6. Two primary stages in digital lifecycle Creation stage ● Intense, focused action ● Maximize value of digital material ● Risk of errors Preservation stage ● Long-term, sporadic action ● Minimize cost of maintenance ● Risk of failures
  7. 7. Strategies for digital preservation Take advantage of our strengths: ● Make lots of copies in different places ● Automate file handling and management Take steps to minimize challenges...
  8. 8. Strategies for both phases Use broadly supported standard file format that store uncompressed data: TIFF for images WAV for audio ● mitigates obsolescence, data fragility ● facilitates future bulk processing
  9. 9. Strategies for both phases Work as consistently as possible; keep good records; document special cases ● Reduce cost of future preservation actions
  10. 10. Strategies for both phases Use file naming system that is simple and consistent, but flexible. Remember: whatever system you choose is (almost) entirely for your convenience—to the computer they’re all just strings of characters. Nevertheless, tool requirements (if and when they exist) override any other factors.
  11. 11. Strategies — File naming Avoid spaces and special characters (/ : * ? “) Use letters and numbers, underscore ( _ ), hyphen( - ). Dot ( . ) is okay, but has a special function For broadest compatibility, use 8.3 convention Don’t use capitalization for meaningful differences
  12. 12. Strategies — File naming Effects and side effects of file names ● Identity ● Order / sequencing (0122.tif 0123.tif 0124.tif) ● Collocation / grouping (ncb01.tif ncb02.tif ncb03.tif mca01.tif mca02.tif)
  13. 13. Strategies — File naming Using meaningful file names can ● Facilitate error detection and recovery o missing or misplaced files ● Aid ‘manual’ handling and checking of files o Name order = natural order o Name reflects content of file in some way ● Increase maintenance and correction costs o Insertion or deletion of files in a sequence
  14. 14. Strategies — File naming Also use directories to help organize files ● Same naming conventions apply (avoid . ) ● Same naming benefits and cautions ● Nesting directories allows for richer hierarchical relationships, but may foil some automation options ● Limit to 500-1000 files when feasible
  15. 15. Strategies — File naming UWDCC naming system for books: One directory per volume, with flexible four-digit sequential filenames. Directories may be grouped for multi-volume monographs, by series, by project, or several of the above UWMad/Yearbooks/Yrbk1972/0001.tif
  16. 16. Strategies — File naming UWDCC naming system for photographs: Short alpha pre-fix with a flexible serial number, ad hoc system of separation into directories, usually based on serial number UWArchives/uwar02/uwar02345.tif
  17. 17. Strategies — File naming Bottom line: If you have technical requirements for file names, follow them. Beyond that, choose a system that maximizes human utility, keeping in mind the balance between encoded meaning and requirements for maintenance
  18. 18. Strategies for creation phase Create high-quality digital surrogates sufficient to meet current and anticipated needs ● Encourages future investment in the material
  19. 19. Strategies for creation phase Create backups of current work and maintain fall-back positions in case corrections are needed ● Reduces cost of errors ● Mitigates fragility and malleability of data
  20. 20. Strategies for creation phase Check your work at major transitions, not just for quality issues, but also for completeness and accuracy ● Increases value of the collection ● Facilitates future processing
  21. 21. Strategies for preservation phase Choose storage media that best match your resources and requirements. ● Make multiple copies so that you can react to failure ● If possible, mitigate technological risk by storing files on different types of media ● Mitigate risk of physical disasters by storing media in multiple locations
  22. 22. Strategies — Storage media Technology Size Stability Cost Flash storage 4 – 256 GB 5-20 years or less $0.50/GB Hard drive (magnetic disk) 1 TB – ? 25-30 years, prone to mechanical failure $0.05/GB +++ Magnetic tape 400 GB – 2.5 TB 25-30 years $0.01–0.50/GB CD-R 630–700 MB 100–200 years for high- quality media (MAM-A) $2.50/disc = $3.50/GB DVD-R/+R 4.7 GB 100–200 years (?) for high-quality media $2.50–4.00/disc = $0.50-0.85/GB The Cloud 1 – 30 TB ? $0.002–0.10/GB monthly!
  23. 23. Strategies — Storage media Over its history, UWDCC has used ● JAZ disks ● Duplicate CD-R ● Duplicate data tapes ● Hard drives with duplicate data tapes We currently have ~18 TB of archived data
  24. 24. Strategies — Storage media Recommended options for getting started ● CD-R or DVD-R/+R o Use the good stuff: MAM-A Gold Archive media o Always make duplicates o Consider supplementing with cloud storage ● Graduate to hard drives o Active RAID-enabled disks much safer than stand- alone hardware sitting on a shelf ● Add tape when technology staff can support
  25. 25. Strategies — Storage media Avoid ● Flash drives — too unstable ● Reliance on the Cloud as your only archive
  26. 26. Strategies for preservation phase Anytime you move data to a new medium or a new physical device, verify! (Now that you’re no longer actively working with the files, it’s easy for a bad transfer to go unnoticed.) If the new media/device can be write-protected, do so.
  27. 27. Strategies for preservation phase Create checksums for each file that you archive ● Use now to verify files on transfer ● Use later to detect data degradation ● Also useful to determine whether files are actually the same or not
  28. 28. Strategies for preservation phase Keep track (metadata!) of where your files are archived ● Material that can’t be located has not been preserved ● Will help to prioritize future preservation actions
  29. 29. UWDCC workflow 1. Metadata first: checklist for subsequent work 2-5. Working files organized under three directories: original, inprocess, final Initial scan to ‘original’ - never edited Copy to ‘inprocess’ - cleaned up for access Finished version to ‘final’ - metadata check 6. Distribution files created from ‘final’ masters
  30. 30. UWDCC workflow 7. ‘Click-through’ all images in test mode 7a. Once all is correct: public release! 8a. Recheck files against metadata 8b. Create checksums for local files 8c. Transfer files to archival media 8d. Verify checksums for transferred files 9. Now safe to delete working copies
  31. 31. UWDCC Tools ● Microsoft Excel or FileMaker Pro for metadata entry (sometimes Access) ● Variety of scanners chosen to maximize flexibility ● Manufacturer’s software / VueScan ● GoldenThread (ISA) for evaluating scanner quality ● Adobe Photoshop for image editing ● AppleScript for custom automation of various workflow tasks ● Built-in Unix functions for checksums, file-handling
  32. 32. Other tool options Image editing: GIMP (Windows, Mac, Linux) Paint.net (Windows) Automation: VBScript, JScript, VBA (Windows) Python (Windows, Mac, Linux) Checksum and verification: Fastsum, Checksum (corz.org) (Windows)
  33. 33. Summary Both Phases Creation Phase Preservation Phase ★ Use broadly supported standard file formats (tiff, wav) ★ Develop consistent workflow, document special cases ★ File naming - follow technical rules; design it for humans ○ Balance between using filename for meaning and keeping it easy to maintain ★ Start with high-quality scans of source documents ★ Make backups of current work, maintain fall-back positions ★ Check work at major transitions ★ Storage media ○ Start: CD-R or DVD- R/+R, maybe supplement with Cloud ○ Step up: hard drives ○ Add tape if can support (Avoid flash drives and Cloud as sole archive) ★ Verify anytime you move things ★ Write-protect if you can ★ Create checksums ★ Metadata: Know what you have, where it is, and what you can do with it
  34. 34. Selected references and reading General DP http://digitalpowrr.niu.edu/wp-content/uploads/2014/05/Overwhelmed-to- action.rinehart_prudhomme_huot_2014.pdf http://commons.lib.niu.edu/handle/10843/13610 http://files.eric.ed.gov/fulltext/ED426715.pdf https://en.wikipedia.org/wiki/Digital_preservation Filenaming http://www.jiscdigitalmedia.ac.uk/guide/choosing-a-file-name Storage media http://www.nps.gov/museum/publications/conserveogram/22-05.pdf
  35. 35. Selected tools and resources Scanning http://www.hamrick.com (Vuescan) http://www.imagescienceassociates.com/ (GoldenThread) Image editing http://www.gimp.org http://www.getpaint.net/index.html Archival CDs and DVDs http://www.mam-a-store.com Scripting http://www.pctools.com/guides/article/id/2/page/1/ https://www.python.org http://macosxautomation.com/applescript/firs ttutorial/index.html Checksum tools http://www.fastsum.com http://corz.org/windows/software/checksum/
  36. 36. Questions?
  37. 37. UWDC & Digital as Preservation The UWDCC recently launched a pilot project in collaboration with our Preservation Department to develop standards and guidelines for utilizing digitization as a preservation medium at UW-Madison. This presentation focuses primarily on workflow and only on changes we can and have implemented in our current environment for preservation-level projects. Detail from page 2 of ‘The modern priscilla’ Vol. XXXVI, No. V (July, 1922). The Dovie Horvitz Collection.
  38. 38. UWDC & Digital as Preservation Type Hardware Software High Speed scanning Panasonic KV-S3065C High Speed Color Scanner Reliable Throughput Image Viewer (RTIV) Flatbed scanning Epson Expression 10000XL (includes one with Epson A3 Transparency adapter) Epson Expression 11000XL Epson Scan Utility Overhead Reprographic scanning BetterLight Super 6K-HS Digital Scanning Back ViewFinder camera control software Slide scanning Nikon Super COOLSCAN 5000 ED film scanner VueScan scanner software Digital photography Equipment
  39. 39. UWDC & Digital as Preservation The basics: ● What is Preservation? - Extending the useful life of our stuff. ● Why do we do it? Protect, Represent, Transcend. Do something with those berries before they spoil! Pickle something! In essence, preservation is extending the useful life of our stuff. Don’t let those veggies just turn into compost. Protect! Secure the value and usefulness of our resources. Taste the summer sunshine in your veggies when you eat them out of season. Represent! We want our digital formats to be an authentic representation of the original. Pickles and jam exist only when cucumbers and berries are transformed into something new Transcend! Preserve originals to take advantage of and/or discover new uses.
  40. 40. UWDC & Digital as Preservation Prep: 1. Identify What do we have that needs preserving? Where did it come from? 2. Evaluate & Assess Make sure our equipment and ingredients are up to the preservation process. Figure out how much we can handle at one time. 3. Select Condition: Does one thing spoil faster than another? High use: Which items circulate the most? Scarcity: What are others not preserving? 4. Review your recipe Consult the cookbooks (in our case FADGI) and make sure you’ve read through your recipe. Have everything you need before you start. Steps 1 & 3 handled by our Preservation Department. Steps 2 & 4 done by UWDCC.
  41. 41. UWDC & Digital as Preservation What did this look like at UWDC? ● Researched current literature - focus on FADGI. ● Established baseline, optimum performance data for hardware - GoldenThread
  42. 42. UWDC & Digital as Preservation FADGI = whoa…Lots to digest! Our takeaways: Evaluate and Assess our digitization environment & tweak our recipe ● Quantifying Scanner Performance ● Targets and software to use for this: GoldenThread ● Color Management Appendix A: Digitizing for Preservation Reformatting of Photographs Compare characteristics of preservation vs. production master files.
  43. 43. UWDC & Digital as Preservation Using GoldenThread ● Flatbeds and Epson Scan software - customizing the color balance settings per scanner ● BetterLights and ViewFinder software - custom tone curves per set-up, per scanner
  44. 44. UWDC & Digital as Preservation Using targets and software to determine performance 3s: +/- 6 aim points 4s: +/- 3 aim points
  45. 45. UWDC & Digital as Preservation Established baseline, optimum performance. Establish maintenance schedule.
  46. 46. UWDC & Digital as Preservation Monthly: Check BetterLight and Flatbed performance against baseline performance with Golden Thread Quarterly: Calibrate monitors on reformatting supervisors’ computers Zig-Align BetterLights (or more frequently if needed) Biannually: Calibrate scanning station monitors Calibrate and characterize BetterLights (create new baseline tone curves in the software) Calibrate and characterize Flatbeds (update histogram settings)
  47. 47. UWDC & Digital as Preservation Access recipe: ● 300 dpi ● 24-bit color or (grayscale on our high speed scanner) ● Flatbed, BetterLight or high speed scanner ● Custom tone curves on BL software per set-up ● Custom histograms on Flatbeds ● Cropping borders based on project ● “Cooked” masters archived Original object itself is the preservation master (you intend to hold onto it) and digital surrogates are for access.
  48. 48. UWDC & Digital as Preservation Preservation recipe: ● 400 dpi ● 24-bit color ● BetterLight only (for now) ● Custom tone curves per project/issue ● Object target captured per page/scan ● Device target per project/issue/day ● Always crop outside the pages ● “Raw” and “Cooked” masters archived Digital version expected to be the preservation master in the absence of the original object, therefore highest possible fidelity is desired.

×