Moving an Archive from Tape to Disk
A Case Study at ICPSR

IASSIST 2008
Stanford University

Bryan Beecher
IT Director
ICP...
Overview of today’s talk
• Where we were
   Background info
   Digital Preservation @ ICPSR in 2006
• Where we went
   ...
What is ICPSR?
• Collect digital objects – primarily
  social science data
• Add value to the objects
• Preserve and disse...
A peak inside ICPSR
• Computer & Network Services
    ICPSR’s technology shop
    System and network management
    Sof...
DigiPres at ICPSR in 2006
• The Good                        • The Bad
    Two copies of each digital       Using low-den...
DigiPres at ICPSR in 2006
• September 2006
    ICPSR hires its first Digital
     Preservation Officer
       • Nancy McG...
Policy changes
• Do NOT need to preserve original media
    Preservation commitment is to the intellectual content
    M...
The Plan
• Track service requests via help desk software
    Who’s asking for materials?
    How many requests for digit...
The Plan (more)
• Transition ALL digital content from tape to disk
    A copy on tape too is OK, but not primary copies
 ...
Interlude - Comcast
• An Internet connection at the Warehouse would be
  very helpful
    Access to databases, Intranet
•...
Our reaction
• Comcast: “Thirty-two
  thousand dollars.”
• ICPSR: “Uh, no.”
• The Warehouse now has an
  AT&T DSL connecti...
Execution – moving to disk
• DLT tape - bulk of our content – approx 275 unique
    Two copies of each tape
      • ICPSR...
Execution – moving to disk
• Approx 5TB of unique content across all tapes
• How many copies?
      (1) ICPSR – on-line
 ...
Execution – moving to disk
• Also have 2000 cartridge (3480) and 9-track tapes
• Have been reading 50/week for many months...
Carefully removing the tapes




                               15
Who ya gonna call?




                     16
Before the harvest




                     17
After the harvest




                    18
Costs - media

           Numbers are in thousands

     40
     30
     20
     10                               Were
   ...
Costs – media (notes)
• Were spending approx
   $2000/TB/copy on DLT tape
   $65k/year staff to read, write, migrate and...
Execution – paper documents
• Stored at the Warehouse
• 3200 sq ft facility located near Ann Arbor airport
    2500 sq ft...
Bird’s eye – 1 of 3




                      22
Bird’s eye – panning right




                             23
Bird’s eye – panning right




                             24
Execution – paper documents
• Phase I (“clean up”)
    Identify, gather and recycle paper with no archival value
       •...
Costs – paper documents
                 Numbers are in thousands

  $200

  $150

  $100                                 ...
Execution – automation
• Digital Object Database
    Database of metadata about every identified file in the
     archive...
Execution – automation
• Goodies for ICPSR staff
    Download page has extra knob to view ALL files
    Intranet tools t...
Looking forward
• Lots of good progress so far…
    Better access for ICPSR staff
    More robust preservation
    Redu...
Looking forward (continued)
• Long-term, off-site, on-line copies
    Heavily subsidized today
    What about the future...
Digital Preservation web site




                                31
Questions?




             32
Upcoming SlideShare
Loading in...5
×

Moving an Archive from Tape to Disk: A Case-Study at ICPSR

1,556

Published on

IASSIST 2008 presentation about our move from off-line to on-line archival storage

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,556
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Moving an Archive from Tape to Disk: A Case-Study at ICPSR

  1. 1. Moving an Archive from Tape to Disk A Case Study at ICPSR IASSIST 2008 Stanford University Bryan Beecher IT Director ICPSR
  2. 2. Overview of today’s talk • Where we were  Background info  Digital Preservation @ ICPSR in 2006 • Where we went  Digital objects  Physical objects • Where we want to go  Fedora 2
  3. 3. What is ICPSR? • Collect digital objects – primarily social science data • Add value to the objects • Preserve and disseminate • Other programs too  Summer Program in Quantitative Methods  Digital Preservation workshop • Clients  Higher-education  Data producers who don’t want to preserve or disseminate 3
  4. 4. A peak inside ICPSR • Computer & Network Services  ICPSR’s technology shop  System and network management  Software, service, and database development • Data Library  Manage off-line storage of digital objects  Manage off-site collection of paper records  Service staff requests for digital and physical objects • Historically had little interaction 4
  5. 5. DigiPres at ICPSR in 2006 • The Good • The Bad  Two copies of each digital  Using low-density tape for object; one off-site archival storage  Metadata stored in a  Metadata not stored with relational database the objects  Stable processes  Manual processes  Large collection of “old  Large collection of “old stuff” (paper records and stuff” (paper records and media) media) 5
  6. 6. DigiPres at ICPSR in 2006 • September 2006  ICPSR hires its first Digital Preservation Officer • Nancy McGovern  Data Library team joins Computer & Network Services • DPO sets policies • The newly expanded CNS implements those policies and operates the technology 6
  7. 7. Policy changes • Do NOT need to preserve original media  Preservation commitment is to the intellectual content  Media is only a container holding that content • Do NOT need to preserve paper records except where there is value • Do need a digital copy outside of Ann Arbor • Do need to collect key metadata about deposits  Provenance  Digital fingerprints 7
  8. 8. The Plan • Track service requests via help desk software  Who’s asking for materials?  How many requests for digital materials v. paper v both?  How many requests each month? • Wherever possible automate digital preservation operations  Completeness and correctness increases  Staff become available for retrospective projects  Also automate ICPSR staff access to materials 8
  9. 9. The Plan (more) • Transition ALL digital content from tape to disk  A copy on tape too is OK, but not primary copies • Expensive to access • Difficult to tell if copy A and copy B are in sync • Discard extraneous administrative documents  Just the “low hanging fruit” • Turn over remaining documents to records management professionals 9
  10. 10. Interlude - Comcast • An Internet connection at the Warehouse would be very helpful  Access to databases, Intranet • Thought we might purchase a broadband connect • We started with Comcast….  Comcast: “We’ll need to include an installation surcharge to cover a few extra installation costs.”  ICPSR: “How much?” 10
  11. 11. Our reaction • Comcast: “Thirty-two thousand dollars.” • ICPSR: “Uh, no.” • The Warehouse now has an AT&T DSL connection 11
  12. 12. Execution – moving to disk • DLT tape - bulk of our content – approx 275 unique  Two copies of each tape • ICPSR HQ • The Warehouse  Each tape holds up to 20Gb to 40Gb • During Feb – Jun 2007 ICPSR moved the content of these tapes to spinning disk • Starting in Jan 2007 ICPSR stopped using DLT tape for archival storage 12
  13. 13. Execution – moving to disk • Approx 5TB of unique content across all tapes • How many copies?  (1) ICPSR – on-line  (1) ICPSR – off-line  (1-3) Chronopolis (SDSC, NCAR, UMd)  (2) IU HPSS  (0-5) LOCKSS-based, NDIIPP-funded syndicated storage  More? • Intending to destroy the DLT media at end of 2008 13
  14. 14. Execution – moving to disk • Also have 2000 cartridge (3480) and 9-track tapes • Have been reading 50/week for many months now; will finish these before the end of 2008  High success rate for reading (> 80%) • Also had a stash of over 10k tapes that had already been migrated, but not discarded  For this we used extra special, extra gentle treatment…… 14
  15. 15. Carefully removing the tapes 15
  16. 16. Who ya gonna call? 16
  17. 17. Before the harvest 17
  18. 18. After the harvest 18
  19. 19. Costs - media Numbers are in thousands 40 30 20 10 Were 0 Now Master Backup Media copy per copy per mgmt TB TB 19
  20. 20. Costs – media (notes) • Were spending approx  $2000/TB/copy on DLT tape  $65k/year staff to read, write, migrate and manage tapes • Now spending approx  $2000/TB/copy for “expensive” SATA disk in our EMC  $100/TB/copy for LTO-3 tape  $0/TB/copy for off-site, on-line copies with our friends  Staff cost for plain old file and tape management can live on the margins 20
  21. 21. Execution – paper documents • Stored at the Warehouse • 3200 sq ft facility located near Ann Arbor airport  2500 sq ft manufacturing space  600 sq ft of office space (the three “Front Rooms”)  100 sq ft of kitchette, rest room • $35k year for rent; $5k for utilities 21
  22. 22. Bird’s eye – 1 of 3 22
  23. 23. Bird’s eye – panning right 23
  24. 24. Bird’s eye – panning right 24
  25. 25. Execution – paper documents • Phase I (“clean up”)  Identify, gather and recycle paper with no archival value • File listings • Census 2000  Completed in 2007; recycled 40 cubic yards • Phase II (“clean out”)  Consolidate Administrative and Archival materials into an acid-free folder stored in an archival quality box  In progress; expect to complete by the end of August 2008 25
  26. 26. Costs – paper documents Numbers are in thousands $200 $150 $100 Current $50 Planned $0 Storage & Retrieval & Supplies & Management Returns Misc 26
  27. 27. Execution – automation • Digital Object Database  Database of metadata about every identified file in the archives • Digital fingerprint • Location • Source • Plugged into our ingest system and our dissemination system • Powers some really useful tools… 27
  28. 28. Execution – automation • Goodies for ICPSR staff  Download page has extra knob to view ALL files  Intranet tools that link • Internal Study Tracking System • Public-facing study download system • Private-facing digital preservation system • Immediate and direct access to all digital objects 28
  29. 29. Looking forward • Lots of good progress so far…  Better access for ICPSR staff  More robust preservation  Reduced costs • But does the IT guy ever give up $ once he gets it? • But not done yet  Still need a “proper” digital preservation system • Fedora 29
  30. 30. Looking forward (continued) • Long-term, off-site, on-line copies  Heavily subsidized today  What about the future costs? • What if we start preserving and disseminating much larger digital objects? • Restricted-access materials  Balancing good preservation v. securing sensitive data 30
  31. 31. Digital Preservation web site 31
  32. 32. Questions? 32
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×