SlideShare a Scribd company logo
Moving an Archive from Tape to Disk
A Case Study at ICPSR

IASSIST 2008
Stanford University

Bryan Beecher
IT Director
ICPSR
Overview of today’s talk
• Where we were
   Background info
   Digital Preservation @ ICPSR in 2006
• Where we went
   Digital objects
   Physical objects
• Where we want to go
   Fedora


                                           2
What is ICPSR?
• Collect digital objects – primarily
  social science data
• Add value to the objects
• Preserve and disseminate
• Other programs too
     Summer Program in Quantitative
      Methods
     Digital Preservation workshop
• Clients
     Higher-education
     Data producers who don’t want to
      preserve or disseminate
                                         3
A peak inside ICPSR
• Computer & Network Services
    ICPSR’s technology shop
    System and network management
    Software, service, and database development
• Data Library
    Manage off-line storage of digital objects
    Manage off-site collection of paper records
    Service staff requests for digital and physical objects
• Historically had little interaction

                                                               4
DigiPres at ICPSR in 2006
• The Good                        • The Bad
    Two copies of each digital       Using low-density tape for
     object; one off-site              archival storage
    Metadata stored in a             Metadata not stored with
     relational database               the objects
    Stable processes                 Manual processes
    Large collection of “old         Large collection of “old
     stuff” (paper records and         stuff” (paper records and
     media)                            media)




                                                                    5
DigiPres at ICPSR in 2006
• September 2006
    ICPSR hires its first Digital
     Preservation Officer
       • Nancy McGovern
    Data Library team joins
     Computer & Network
     Services
• DPO sets policies
• The newly expanded CNS
  implements those policies
  and operates the technology

                                     6
Policy changes
• Do NOT need to preserve original media
    Preservation commitment is to the intellectual content
    Media is only a container holding that content
• Do NOT need to preserve paper records except
  where there is value
• Do need a digital copy outside of Ann Arbor
• Do need to collect key metadata about deposits
    Provenance
    Digital fingerprints

                                                              7
The Plan
• Track service requests via help desk software
    Who’s asking for materials?
    How many requests for digital materials v. paper v both?
    How many requests each month?
• Wherever possible automate digital preservation
  operations
    Completeness and correctness increases
    Staff become available for retrospective projects
    Also automate ICPSR staff access to materials

                                                                8
The Plan (more)
• Transition ALL digital content from tape to disk
    A copy on tape too is OK, but not primary copies
      • Expensive to access
      • Difficult to tell if copy A and copy B are in sync
• Discard extraneous administrative documents
    Just the “low hanging fruit”
• Turn over remaining documents to records
  management professionals


                                                             9
Interlude - Comcast
• An Internet connection at the Warehouse would be
  very helpful
    Access to databases, Intranet
• Thought we might purchase a broadband connect
• We started with Comcast….
    Comcast: “We’ll need to include an installation surcharge
     to cover a few extra installation costs.”
    ICPSR: “How much?”



                                                                 10
Our reaction
• Comcast: “Thirty-two
  thousand dollars.”
• ICPSR: “Uh, no.”
• The Warehouse now has an
  AT&T DSL connection




                             11
Execution – moving to disk
• DLT tape - bulk of our content – approx 275 unique
    Two copies of each tape
      • ICPSR HQ
      • The Warehouse
    Each tape holds up to 20Gb to 40Gb
• During Feb – Jun 2007 ICPSR moved the content of
  these tapes to spinning disk
• Starting in Jan 2007 ICPSR stopped using DLT tape
  for archival storage

                                                       12
Execution – moving to disk
• Approx 5TB of unique content across all tapes
• How many copies?
      (1) ICPSR – on-line
      (1) ICPSR – off-line
      (1-3) Chronopolis (SDSC, NCAR, UMd)
      (2) IU HPSS
      (0-5) LOCKSS-based, NDIIPP-funded syndicated storage
      More?
• Intending to destroy the DLT media at end of 2008

                                                              13
Execution – moving to disk
• Also have 2000 cartridge (3480) and 9-track tapes
• Have been reading 50/week for many months now;
  will finish these before the end of 2008
    High success rate for reading (> 80%)
• Also had a stash of over 10k tapes that had already
  been migrated, but not discarded
    For this we used extra special, extra gentle treatment……




                                                                14
Carefully removing the tapes




                               15
Who ya gonna call?




                     16
Before the harvest




                     17
After the harvest




                    18
Costs - media

           Numbers are in thousands

     40
     30
     20
     10                               Were
     0                                Now
           Master   Backup    Media
          copy per copy per   mgmt
            TB        TB


                                             19
Costs – media (notes)
• Were spending approx
   $2000/TB/copy on DLT tape
   $65k/year staff to read, write, migrate and manage tapes
• Now spending approx
     $2000/TB/copy for “expensive” SATA disk in our EMC
     $100/TB/copy for LTO-3 tape
     $0/TB/copy for off-site, on-line copies with our friends
     Staff cost for plain old file and tape management can live on
      the margins

                                                                      20
Execution – paper documents
• Stored at the Warehouse
• 3200 sq ft facility located near Ann Arbor airport
    2500 sq ft manufacturing space
    600 sq ft of office space (the three “Front Rooms”)
    100 sq ft of kitchette, rest room
• $35k year for rent; $5k for utilities




                                                           21
Bird’s eye – 1 of 3




                      22
Bird’s eye – panning right




                             23
Bird’s eye – panning right




                             24
Execution – paper documents
• Phase I (“clean up”)
    Identify, gather and recycle paper with no archival value
       • File listings
       • Census 2000
    Completed in 2007; recycled 40 cubic yards
• Phase II (“clean out”)
    Consolidate Administrative and Archival materials into an
     acid-free folder stored in an archival quality box
    In progress; expect to complete by the end of August 2008

                                                                 25
Costs – paper documents
                 Numbers are in thousands

  $200

  $150

  $100                                           Current
   $50                                           Planned

    $0
          Storage &   Retrieval &   Supplies &
         Management    Returns        Misc


                                                           26
Execution – automation
• Digital Object Database
    Database of metadata about every identified file in the
     archives
      • Digital fingerprint
      • Location
      • Source
• Plugged into our ingest system and our
  dissemination system
• Powers some really useful tools…

                                                               27
Execution – automation
• Goodies for ICPSR staff
    Download page has extra knob to view ALL files
    Intranet tools that link
       • Internal Study Tracking System
       • Public-facing study download system
       • Private-facing digital preservation system
• Immediate and direct access to all digital objects



                                                       28
Looking forward
• Lots of good progress so far…
    Better access for ICPSR staff
    More robust preservation
    Reduced costs
      • But does the IT guy ever give up $ once he gets it?
• But not done yet
    Still need a “proper” digital preservation system
      • Fedora



                                                              29
Looking forward (continued)
• Long-term, off-site, on-line copies
    Heavily subsidized today
    What about the future costs?
      • What if we start preserving and disseminating much
        larger digital objects?
• Restricted-access materials
    Balancing good preservation v. securing sensitive data




                                                              30
Digital Preservation web site




                                31
Questions?




             32

More Related Content

What's hot

Digitization - Basic questions
Digitization - Basic questionsDigitization - Basic questions
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
Chris Rusbridge
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Kay Gregg
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
Chris Rusbridge
 
Panzura Global Storage System
Panzura Global Storage SystemPanzura Global Storage System
Panzura Global Storage System
Panzura
 
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
Amazon Web Services
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage Solution
Phil Cryer
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
Eric Baldeschwieler
 
IWM DAMS
IWM DAMSIWM DAMS
IWM DAMS
Rosie Forrest
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Eric Baldeschwieler
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 
Digitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and MuseumsDigitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and Museums
Martin Kalfatovic
 
Audiovisual Digitization and Quality Control: How do people really do this?
Audiovisual Digitization and Quality Control: How do people really do this?Audiovisual Digitization and Quality Control: How do people really do this?
Audiovisual Digitization and Quality Control: How do people really do this?
WiLS
 
IWM Infrastructure
IWM InfrastructureIWM Infrastructure
IWM Infrastructure
Rosie Forrest
 

What's hot (14)

Digitization - Basic questions
Digitization - Basic questionsDigitization - Basic questions
Digitization - Basic questions
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your CollectionNavigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
Navigating the Analog Waves: Digitizing Audio Cassettes for Your Collection
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
 
Panzura Global Storage System
Panzura Global Storage SystemPanzura Global Storage System
Panzura Global Storage System
 
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
AWS Partner Presentation – Panzura – AWS Cloud Storage for the Enterprise 2012
 
Building A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage SolutionBuilding A Scalable Open Source Storage Solution
Building A Scalable Open Source Storage Solution
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
IWM DAMS
IWM DAMSIWM DAMS
IWM DAMS
 
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
Hadoop - Where did it come from and what's next? (Pasadena Sept 2014)
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Digitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and MuseumsDigitization Basics for Libraries, Archives, and Museums
Digitization Basics for Libraries, Archives, and Museums
 
Audiovisual Digitization and Quality Control: How do people really do this?
Audiovisual Digitization and Quality Control: How do people really do this?Audiovisual Digitization and Quality Control: How do people really do this?
Audiovisual Digitization and Quality Control: How do people really do this?
 
IWM Infrastructure
IWM InfrastructureIWM Infrastructure
IWM Infrastructure
 

Similar to Moving an Archive from Tape to Disk: A Case-Study at ICPSR

Ariadne overview
Ariadne overviewAriadne overview
Ariadne overview
ariadnenetwork
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Future Perfect 2012
 
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Frederick Zarndt
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
Zohar Elkayam
 
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
National Digital Forum
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
George Ang
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
John Kunze
 
Big data
Big dataBig data
Big data
Ankur Raina
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Managing active data: storage, access, academic dropbox services
Managing active data: storage, access, academic dropbox servicesManaging active data: storage, access, academic dropbox services
Managing active data: storage, access, academic dropbox services
Marieke Guy
 
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
Rose Holley
 
I say emulate
I say emulateI say emulate
Institutional repository
Institutional repositoryInstitutional repository
Institutional repository
Waqas Ahmed
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
Jenny Mitcham
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
Jenny Mitcham
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
National Library of Australia
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
Zohar Elkayam
 
AMIA: Examining AV Enterprise at a Regional Academic Archive
AMIA: Examining AV Enterprise at a Regional Academic ArchiveAMIA: Examining AV Enterprise at a Regional Academic Archive
AMIA: Examining AV Enterprise at a Regional Academic Archive
Jessica Breiman
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015
spectralogic
 
Big data
Big dataBig data
Big data
roysonli
 

Similar to Moving an Archive from Tape to Disk: A Case-Study at ICPSR (20)

Ariadne overview
Ariadne overviewAriadne overview
Ariadne overview
 
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
Kris Carpenter Negulescu Gordon Paynter Archiving the National Web of New Zea...
 
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]Digital projects best practices [xxxiii reunión nacional de archivos 201111]
Digital projects best practices [xxxiii reunión nacional de archivos 201111]
 
The Hadoop Ecosystem for Developers
The Hadoop Ecosystem for DevelopersThe Hadoop Ecosystem for Developers
The Hadoop Ecosystem for Developers
 
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
Digitisation projects: Purpose, planning, process, people :: Vye Perrone, Uni...
 
2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches2010 AIRI Petabyte Challenge - View From The Trenches
2010 AIRI Petabyte Challenge - View From The Trenches
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
Big data
Big dataBig data
Big data
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Managing active data: storage, access, academic dropbox services
Managing active data: storage, access, academic dropbox servicesManaging active data: storage, access, academic dropbox services
Managing active data: storage, access, academic dropbox services
 
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
Born Again: The Digitisation of the Anthropology Photographic Archive. 2004
 
I say emulate
I say emulateI say emulate
I say emulate
 
Institutional repository
Institutional repositoryInstitutional repository
Institutional repository
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office VictoriaJust digitise it - Daniel Wilksch of the Public Records Office Victoria
Just digitise it - Daniel Wilksch of the Public Records Office Victoria
 
Big data for cio 2015
Big data for cio 2015Big data for cio 2015
Big data for cio 2015
 
AMIA: Examining AV Enterprise at a Regional Academic Archive
AMIA: Examining AV Enterprise at a Regional Academic ArchiveAMIA: Examining AV Enterprise at a Regional Academic Archive
AMIA: Examining AV Enterprise at a Regional Academic Archive
 
Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015Spectra Logic BlackPearl Developer Summit 2015
Spectra Logic BlackPearl Developer Summit 2015
 
Big data
Big dataBig data
Big data
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 

Moving an Archive from Tape to Disk: A Case-Study at ICPSR

  • 1. Moving an Archive from Tape to Disk A Case Study at ICPSR IASSIST 2008 Stanford University Bryan Beecher IT Director ICPSR
  • 2. Overview of today’s talk • Where we were  Background info  Digital Preservation @ ICPSR in 2006 • Where we went  Digital objects  Physical objects • Where we want to go  Fedora 2
  • 3. What is ICPSR? • Collect digital objects – primarily social science data • Add value to the objects • Preserve and disseminate • Other programs too  Summer Program in Quantitative Methods  Digital Preservation workshop • Clients  Higher-education  Data producers who don’t want to preserve or disseminate 3
  • 4. A peak inside ICPSR • Computer & Network Services  ICPSR’s technology shop  System and network management  Software, service, and database development • Data Library  Manage off-line storage of digital objects  Manage off-site collection of paper records  Service staff requests for digital and physical objects • Historically had little interaction 4
  • 5. DigiPres at ICPSR in 2006 • The Good • The Bad  Two copies of each digital  Using low-density tape for object; one off-site archival storage  Metadata stored in a  Metadata not stored with relational database the objects  Stable processes  Manual processes  Large collection of “old  Large collection of “old stuff” (paper records and stuff” (paper records and media) media) 5
  • 6. DigiPres at ICPSR in 2006 • September 2006  ICPSR hires its first Digital Preservation Officer • Nancy McGovern  Data Library team joins Computer & Network Services • DPO sets policies • The newly expanded CNS implements those policies and operates the technology 6
  • 7. Policy changes • Do NOT need to preserve original media  Preservation commitment is to the intellectual content  Media is only a container holding that content • Do NOT need to preserve paper records except where there is value • Do need a digital copy outside of Ann Arbor • Do need to collect key metadata about deposits  Provenance  Digital fingerprints 7
  • 8. The Plan • Track service requests via help desk software  Who’s asking for materials?  How many requests for digital materials v. paper v both?  How many requests each month? • Wherever possible automate digital preservation operations  Completeness and correctness increases  Staff become available for retrospective projects  Also automate ICPSR staff access to materials 8
  • 9. The Plan (more) • Transition ALL digital content from tape to disk  A copy on tape too is OK, but not primary copies • Expensive to access • Difficult to tell if copy A and copy B are in sync • Discard extraneous administrative documents  Just the “low hanging fruit” • Turn over remaining documents to records management professionals 9
  • 10. Interlude - Comcast • An Internet connection at the Warehouse would be very helpful  Access to databases, Intranet • Thought we might purchase a broadband connect • We started with Comcast….  Comcast: “We’ll need to include an installation surcharge to cover a few extra installation costs.”  ICPSR: “How much?” 10
  • 11. Our reaction • Comcast: “Thirty-two thousand dollars.” • ICPSR: “Uh, no.” • The Warehouse now has an AT&T DSL connection 11
  • 12. Execution – moving to disk • DLT tape - bulk of our content – approx 275 unique  Two copies of each tape • ICPSR HQ • The Warehouse  Each tape holds up to 20Gb to 40Gb • During Feb – Jun 2007 ICPSR moved the content of these tapes to spinning disk • Starting in Jan 2007 ICPSR stopped using DLT tape for archival storage 12
  • 13. Execution – moving to disk • Approx 5TB of unique content across all tapes • How many copies?  (1) ICPSR – on-line  (1) ICPSR – off-line  (1-3) Chronopolis (SDSC, NCAR, UMd)  (2) IU HPSS  (0-5) LOCKSS-based, NDIIPP-funded syndicated storage  More? • Intending to destroy the DLT media at end of 2008 13
  • 14. Execution – moving to disk • Also have 2000 cartridge (3480) and 9-track tapes • Have been reading 50/week for many months now; will finish these before the end of 2008  High success rate for reading (> 80%) • Also had a stash of over 10k tapes that had already been migrated, but not discarded  For this we used extra special, extra gentle treatment…… 14
  • 16. Who ya gonna call? 16
  • 19. Costs - media Numbers are in thousands 40 30 20 10 Were 0 Now Master Backup Media copy per copy per mgmt TB TB 19
  • 20. Costs – media (notes) • Were spending approx  $2000/TB/copy on DLT tape  $65k/year staff to read, write, migrate and manage tapes • Now spending approx  $2000/TB/copy for “expensive” SATA disk in our EMC  $100/TB/copy for LTO-3 tape  $0/TB/copy for off-site, on-line copies with our friends  Staff cost for plain old file and tape management can live on the margins 20
  • 21. Execution – paper documents • Stored at the Warehouse • 3200 sq ft facility located near Ann Arbor airport  2500 sq ft manufacturing space  600 sq ft of office space (the three “Front Rooms”)  100 sq ft of kitchette, rest room • $35k year for rent; $5k for utilities 21
  • 22. Bird’s eye – 1 of 3 22
  • 23. Bird’s eye – panning right 23
  • 24. Bird’s eye – panning right 24
  • 25. Execution – paper documents • Phase I (“clean up”)  Identify, gather and recycle paper with no archival value • File listings • Census 2000  Completed in 2007; recycled 40 cubic yards • Phase II (“clean out”)  Consolidate Administrative and Archival materials into an acid-free folder stored in an archival quality box  In progress; expect to complete by the end of August 2008 25
  • 26. Costs – paper documents Numbers are in thousands $200 $150 $100 Current $50 Planned $0 Storage & Retrieval & Supplies & Management Returns Misc 26
  • 27. Execution – automation • Digital Object Database  Database of metadata about every identified file in the archives • Digital fingerprint • Location • Source • Plugged into our ingest system and our dissemination system • Powers some really useful tools… 27
  • 28. Execution – automation • Goodies for ICPSR staff  Download page has extra knob to view ALL files  Intranet tools that link • Internal Study Tracking System • Public-facing study download system • Private-facing digital preservation system • Immediate and direct access to all digital objects 28
  • 29. Looking forward • Lots of good progress so far…  Better access for ICPSR staff  More robust preservation  Reduced costs • But does the IT guy ever give up $ once he gets it? • But not done yet  Still need a “proper” digital preservation system • Fedora 29
  • 30. Looking forward (continued) • Long-term, off-site, on-line copies  Heavily subsidized today  What about the future costs? • What if we start preserving and disseminating much larger digital objects? • Restricted-access materials  Balancing good preservation v. securing sensitive data 30