SlideShare a Scribd company logo
1 of 17
Download to read offline
The Century Archive Project “CAP”
Technology-Independent Information Storage
Steven H. McCown & Michael Leonhardt
Storage Technology Corporation
4 April 2002
McCown & Leonhardt – 4/4/2002
What is a “Document”?
 A document is:
– Letter, check, picture, plot, report, birth certificate, deed...
 A document is NOT:
– Database element
– Encoded record
– Encoded object
 Perhaps:
– ASCII record of transaction
– Image of database table
– etc.
McCown & Leonhardt – 4/4/2002
Documents in a “Paperless” Environment
 4.4 M Tons of Paper Printed in 1995 … to 5.9 M in 2000
 790 B Sheets Laser Printers in 1996 … to 1.2 T sheets in 2001
 810 B Sheets From Office Copiers in 1996...to 1.1 T Sheets in 2001
 21 Billion Letters Sent
 170 Billion Pages of Fiche
 60 Billion Checks Processed Each Year
 E-Mail has created 40% more (personal) printing
 +$100 M in corporate revenues adds 8.8 M sheets printed
McCown & Leonhardt – 4/4/2002
“To ensure that the media will be readable far into the future, it may
be necessary to archive the system along with the media. For a
100-year life, recording systems and sufficient spare parts will
need to be archived along with the data storage media. Media with
life expectancies greater than 20 years are capable of out-surviving
existing recording system technologies.”
-- John Van Bogart, NARA 11th Annual Preservation Conference,
“Magnetic Tape Storage”, 1996
McCown & Leonhardt – 4/4/2002
Information Management
 Long-term storage
– Defined: in excess of 100 years
– Inherent to many domains such as genealogy
 Information Management strategies
– Usually based on frequent data migration
– Poor incorporation of long-term storage
 Problem:
– How to access today’s archives in 100 years or more
McCown & Leonhardt – 4/4/2002
Long-Term Storage Wish List
 Easy integration with data processing environments
 Easy data access
 Migration free
 Long-life media – “no maintenance”
 Reader technology independence
 Human readable data
 Low cost
McCown & Leonhardt – 4/4/2002
Current Options
 Encode the data and record digitally
– Magnetic media
– Optical media
 Store unencoded, human readable images
– Microfilm
 Something new - “CAP”
McCown & Leonhardt – 4/4/2002
Century Archive Project
 Features
– High density storage of human-readable document images
– Storage of digitally encoded documents
– Metadata ascription to aid retrieval
– Industry standard physical media form factor
– Patent on concepts and format filed
McCown & Leonhardt – 4/4/2002
CAP Operations
 Scan document to create electronic (e.g. TIFF) file
 Write de-magnified image on optical tape with scanning laser
– Use WORM optical media
– Create analog record (human readable)
– Append new documents as needed
 Write digitally encoded document file
 Read
– View magnified image
Direct or CCD camera & monitor
– Recover digital file
McCown & Leonhardt – 4/4/2002
Tape Record Layout
McCown & Leonhardt – 4/4/2002
Features
 Record header with document index and metadata
 Updatable Table of Contents
 Digital record in addition to analog record
– Retrieve digital version if compatible reader available
– Include digital header and TOC
 Gray scale Documents
– Use half-toning technique
 Color Documents
– Store separate images for red, green and blue breakdown
– Requires three-beam optics for direct color viewing
 Stereoscopic images
McCown & Leonhardt – 4/4/2002
Adjacent Digital File
 TIFF file is reformatted with ECC and bit encoding
– Image file compressed using lossless compression
 Digital record format on tape:
– Width same as analog record
– Track spacing doubled to reduce crosstalk on read-out
– Length 1.5x to 2x analog record
McCown & Leonhardt – 4/4/2002
Retrieved Image
McCown & Leonhardt – 4/4/2002
Tape Format Example
 For 8 1/2” wide documents
– Scan images at 300 dpi
– Write images on tape at 25,400 dpi
85x reduction in size
– Document length parallel to tape
Accommodates different lengths
– 5 document tracks across tape
 1/2” tape in 3480 cartridge
– 200m of tape
– 220,000 image documents
– 80,000 documents with both image and digital records
McCown & Leonhardt – 4/4/2002
Storage Costs ($/MB)
 Manually intensive
– Paper - $10.00
– Microfiche (volumetric improvement only) - $1.20
 “Semi-automated” (manually mounted media)
– Non-automated magnetic tape
– Microfilm - $.005 (media only, 16 mm)
 Full automation
– Magnetic tape - $.004
– Optical disk - $.03
– CAP - $.002
McCown & Leonhardt – 4/4/2002
Century Archive Summary
 Provides alternative storage method for valued documents,
images
– Direct optical viewing
– Eliminates drive, media technology migration
– Robust media options for relaxed environmental storage conditions
 Provides digital storage
– Faster availability
– Data integration
 Complements magnetic tape storage of bulk records
McCown & Leonhardt – 4/4/2002
Questions?

More Related Content

Similar to The Century Archive Project "Cap"

Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
PrestoCentre
 

Similar to The Century Archive Project "Cap" (20)

DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMSDAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
DAMbusters: IWM’s Mission to Design and Implement a Bespoke DAMS
 
Information and communication technology
Information and communication technologyInformation and communication technology
Information and communication technology
 
Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4Cs8092 computer graphics and multimedia unit 4
Cs8092 computer graphics and multimedia unit 4
 
Chapter 2 Digital Data
Chapter 2 Digital DataChapter 2 Digital Data
Chapter 2 Digital Data
 
MIS Lesson4 Multimedia
MIS Lesson4 MultimediaMIS Lesson4 Multimedia
MIS Lesson4 Multimedia
 
Data Management
Data ManagementData Management
Data Management
 
E-Resource
E-ResourceE-Resource
E-Resource
 
Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
Building Workflows for Digitsation and Digital Preservation - Tobias Golodnof...
 
Digitization
DigitizationDigitization
Digitization
 
File Formats for Preservation
File Formats for PreservationFile Formats for Preservation
File Formats for Preservation
 
MULTMEDIA DATABASE.ppt
MULTMEDIA DATABASE.pptMULTMEDIA DATABASE.ppt
MULTMEDIA DATABASE.ppt
 
Digital Revolution
Digital RevolutionDigital Revolution
Digital Revolution
 
Chapter Two
Chapter TwoChapter Two
Chapter Two
 
Digital Collections Forum2004 Dgm
Digital Collections Forum2004 DgmDigital Collections Forum2004 Dgm
Digital Collections Forum2004 Dgm
 
Unit 4 and 5
Unit 4 and 5Unit 4 and 5
Unit 4 and 5
 
(Ch2) Electronic Publishing (2).pptx
(Ch2) Electronic Publishing (2).pptx(Ch2) Electronic Publishing (2).pptx
(Ch2) Electronic Publishing (2).pptx
 
Demystifying pd fs
Demystifying pd fsDemystifying pd fs
Demystifying pd fs
 
Information technology (ict3)
Information technology (ict3)Information technology (ict3)
Information technology (ict3)
 
Digitisation Overview
Digitisation OverviewDigitisation Overview
Digitisation Overview
 
40120130405024 2
40120130405024 240120130405024 2
40120130405024 2
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Recently uploaded (20)

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 

The Century Archive Project "Cap"

  • 1. The Century Archive Project “CAP” Technology-Independent Information Storage Steven H. McCown & Michael Leonhardt Storage Technology Corporation 4 April 2002
  • 2. McCown & Leonhardt – 4/4/2002 What is a “Document”?  A document is: – Letter, check, picture, plot, report, birth certificate, deed...  A document is NOT: – Database element – Encoded record – Encoded object  Perhaps: – ASCII record of transaction – Image of database table – etc.
  • 3. McCown & Leonhardt – 4/4/2002 Documents in a “Paperless” Environment  4.4 M Tons of Paper Printed in 1995 … to 5.9 M in 2000  790 B Sheets Laser Printers in 1996 … to 1.2 T sheets in 2001  810 B Sheets From Office Copiers in 1996...to 1.1 T Sheets in 2001  21 Billion Letters Sent  170 Billion Pages of Fiche  60 Billion Checks Processed Each Year  E-Mail has created 40% more (personal) printing  +$100 M in corporate revenues adds 8.8 M sheets printed
  • 4. McCown & Leonhardt – 4/4/2002 “To ensure that the media will be readable far into the future, it may be necessary to archive the system along with the media. For a 100-year life, recording systems and sufficient spare parts will need to be archived along with the data storage media. Media with life expectancies greater than 20 years are capable of out-surviving existing recording system technologies.” -- John Van Bogart, NARA 11th Annual Preservation Conference, “Magnetic Tape Storage”, 1996
  • 5. McCown & Leonhardt – 4/4/2002 Information Management  Long-term storage – Defined: in excess of 100 years – Inherent to many domains such as genealogy  Information Management strategies – Usually based on frequent data migration – Poor incorporation of long-term storage  Problem: – How to access today’s archives in 100 years or more
  • 6. McCown & Leonhardt – 4/4/2002 Long-Term Storage Wish List  Easy integration with data processing environments  Easy data access  Migration free  Long-life media – “no maintenance”  Reader technology independence  Human readable data  Low cost
  • 7. McCown & Leonhardt – 4/4/2002 Current Options  Encode the data and record digitally – Magnetic media – Optical media  Store unencoded, human readable images – Microfilm  Something new - “CAP”
  • 8. McCown & Leonhardt – 4/4/2002 Century Archive Project  Features – High density storage of human-readable document images – Storage of digitally encoded documents – Metadata ascription to aid retrieval – Industry standard physical media form factor – Patent on concepts and format filed
  • 9. McCown & Leonhardt – 4/4/2002 CAP Operations  Scan document to create electronic (e.g. TIFF) file  Write de-magnified image on optical tape with scanning laser – Use WORM optical media – Create analog record (human readable) – Append new documents as needed  Write digitally encoded document file  Read – View magnified image Direct or CCD camera & monitor – Recover digital file
  • 10. McCown & Leonhardt – 4/4/2002 Tape Record Layout
  • 11. McCown & Leonhardt – 4/4/2002 Features  Record header with document index and metadata  Updatable Table of Contents  Digital record in addition to analog record – Retrieve digital version if compatible reader available – Include digital header and TOC  Gray scale Documents – Use half-toning technique  Color Documents – Store separate images for red, green and blue breakdown – Requires three-beam optics for direct color viewing  Stereoscopic images
  • 12. McCown & Leonhardt – 4/4/2002 Adjacent Digital File  TIFF file is reformatted with ECC and bit encoding – Image file compressed using lossless compression  Digital record format on tape: – Width same as analog record – Track spacing doubled to reduce crosstalk on read-out – Length 1.5x to 2x analog record
  • 13. McCown & Leonhardt – 4/4/2002 Retrieved Image
  • 14. McCown & Leonhardt – 4/4/2002 Tape Format Example  For 8 1/2” wide documents – Scan images at 300 dpi – Write images on tape at 25,400 dpi 85x reduction in size – Document length parallel to tape Accommodates different lengths – 5 document tracks across tape  1/2” tape in 3480 cartridge – 200m of tape – 220,000 image documents – 80,000 documents with both image and digital records
  • 15. McCown & Leonhardt – 4/4/2002 Storage Costs ($/MB)  Manually intensive – Paper - $10.00 – Microfiche (volumetric improvement only) - $1.20  “Semi-automated” (manually mounted media) – Non-automated magnetic tape – Microfilm - $.005 (media only, 16 mm)  Full automation – Magnetic tape - $.004 – Optical disk - $.03 – CAP - $.002
  • 16. McCown & Leonhardt – 4/4/2002 Century Archive Summary  Provides alternative storage method for valued documents, images – Direct optical viewing – Eliminates drive, media technology migration – Robust media options for relaxed environmental storage conditions  Provides digital storage – Faster availability – Data integration  Complements magnetic tape storage of bulk records
  • 17. McCown & Leonhardt – 4/4/2002 Questions?