Introduction to Digital     Preservation    William LeFurgy       2/6/2013
Session overview•   Some definitions•   Digital preservation challenges•   Preservation strategies•   Non-technical issues...
Definitions• Digital stewardship, preservation, curation   – Often used interchangeably to mean active management of digit...
Pictures Rather than WordsAtlas of Digital Damages on Flickr, http://ow.ly/hrJ7i
The Digital Preservation Challenge• Libraries, archives, museums and other cultural  heritage institutions have unparallel...
Problem: Lots and Lots of Data• Huge volume of digital information—and it is  rapidly growing• Organizations, governments ...
Problem: Information Complexity• Dynamic databases, websites• Sophisticated specialty uses: CGI, CAD/CAM,  geospatial…• Hi...
Problem: Technological          Dependency/Obsolescence• Every piece of digital information depends on a stack  of technol...
Osborne I, 1982 : WordStar, 5.25” floppy, CP/M
What is “Preservation”?• What does a system need to do with information to provide for  adequate preservation and access, ...
Progress is Evident• A number of initiatives are tackling the issue  around the world• Using some common principals, but d...
Mix of Institutional Strategies• Build Institutional foundations   – Provide mandate and policies for a preservation progr...
Preservation Approaches• Differences of opinion now exist• Possible that future approaches will emerge• Three commonly acc...
Approach: Bit PreservationCapture information in its original form andfocus on maintaining data integrity: files arekept u...
Approach: MigrationTransform/normalize data into formats andstructures that are optimal for preservationAdvantages        ...
Approach: EmulationUse software to mimic behavior of obsoletesystems to access and use original dataAdvantages            ...
Preferred Approaches Share Basic Ideas • No optimal system; iterative improvements will continue • Keep the original files...
Preferred Approaches Are Open• Open architectures:  – Allows adding, upgrading and swapping system    components from diff...
Important Non-Technical Issues• Collaboration: new models needed for institutions,  communities to work together• Institut...
Copyrighted, Private, Confidential• Exceptions in U. S. Copyright law for libraries  & archives are outdated  – 3 copy lim...
Digital Forensics• Tools and approaches for protecting and  extracting digital information• Special relevance for all type...
Work with Current & Archaic Data• Must handle current digital information from  mobile devices, networks, live data on rem...
Personal Archiving• “Personal papers” increasingly digital• Social media, web largely driven by personal  creation• Person...
Forensic Life Cycle (Partial)• Securing and Evaluating the Scene: ensure safety, confirm computer  equipment present, secu...
Summary• Digital information presents tough issues in terms of  preservation and access• Libraries and archives must addre...
For More Information: A Partial List• Digital preservation: an introduction, UKLON, http://ow.ly/hpoWr• An Introduction to...
For More Information: A Partial List-2• LIFE3: A Predictive Costing Tool For Digital Collections, Life Cycle Informationfo...
Upcoming SlideShare
Loading in …5
×

Introduction to Digital Preservation

1,142 views

Published on

Given as part of a University of Maryland College of Information Studies course on preservation.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,142
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Introduction to Digital Preservation

  1. 1. Introduction to Digital Preservation William LeFurgy 2/6/2013
  2. 2. Session overview• Some definitions• Digital preservation challenges• Preservation strategies• Non-technical issues • Collaboration, institutional culture, legal issues, costs• Digital Forensics
  3. 3. Definitions• Digital stewardship, preservation, curation – Often used interchangeably to mean active management of digital information over time to ensure its accessibility; “stewardship” is broadest term• Born digital – Information created in digital form (not digitized!)• Digitize (scan, reformat, “reborn” digital) – Create a digital copy of an analog original• Life Cycle – Stages that digital content moves through from creation to preservation to access• Archive/Archival Store/Repository – System used to accept, store and access specified information with long term value; provides for secure, redundant and managed administration; protect content and ensure ongoing access
  4. 4. Pictures Rather than WordsAtlas of Digital Damages on Flickr, http://ow.ly/hrJ7i
  5. 5. The Digital Preservation Challenge• Libraries, archives, museums and other cultural heritage institutions have unparalleled experience managing analog items• Digital information is an existential test: institutions have to figure out a new way of doing business• Hard, because most institutions and staff have limited experience dealing with digital• Hard, too, because digital presents challenges
  6. 6. Problem: Lots and Lots of Data• Huge volume of digital information—and it is rapidly growing• Organizations, governments and individuals are all information creators• Some large chunks of this information has value—actual or potential—from perspective of archives/libraries• Which chunks to focus on?
  7. 7. Problem: Information Complexity• Dynamic databases, websites• Sophisticated specialty uses: CGI, CAD/CAM, geospatial…• Highly specialized applications dependent on deep knowledge: scientific databases
  8. 8. Problem: Technological Dependency/Obsolescence• Every piece of digital information depends on a stack of technologies working perfectly together, e.g.: – File format (pdf, html, doc) – Storage media (cloud, hard drive, USB drive) – Application software (reader, browser, app) – Operating system (Windows XP, Vista, 7) – Computing device (PC, laptop, smart phone)• Each layer of the stack is changing• Ensuring ongoing access requires work, careful planning
  9. 9. Osborne I, 1982 : WordStar, 5.25” floppy, CP/M
  10. 10. What is “Preservation”?• What does a system need to do with information to provide for adequate preservation and access, now and in the future?• Is saving the original files enough? Do they need to be converted/normalized?• What metadata needs to be available?• How important is original “look and feel” compared with information content?• Answers to such questions drive strategies, approaches
  11. 11. Progress is Evident• A number of initiatives are tackling the issue around the world• Using some common principals, but different approaches• Reasons for optimism: – Important elements of the issue are defined – Solid conceptual framework exists – Biggest institutions are deeply engaged – Extensive cooperation, sharing, open development
  12. 12. Mix of Institutional Strategies• Build Institutional foundations – Provide mandate and policies for a preservation program – Trusted Digital Repositories/TRAC• Develop Internal systems – Build an infrastructure (Proprietary? Open Source?)• Use External Services – Pay for an existing infrastructure• Learn by doing – Identify/capture content, rely on iterative improvement• Collaborate – Work with others on shared approaches• Observe and wait
  13. 13. Preservation Approaches• Differences of opinion now exist• Possible that future approaches will emerge• Three commonly accepted approaches today: • Bit preservation • Migration • Emulation• Can rely on one approach or a hybrid
  14. 14. Approach: Bit PreservationCapture information in its original form andfocus on maintaining data integrity: files arekept unchanged Advantages Disadvantages •Lower cost • Useful life of data unclear •Scalable, practicable • Future functionality (look and feel) •Works well (so far) at risk
  15. 15. Approach: MigrationTransform/normalize data into formats andstructures that are optimal for preservationAdvantages Disadvantages •Homogeneous data easier • Complex ingest processesto manage, access • Loss of data, functionality •Files are preserved with rich • Based on assumptions about futurecontextual metadata • IP issues major barrier •Potential to solve • Scalability, practicality not provenpreservation issues once andfor all
  16. 16. Approach: EmulationUse software to mimic behavior of obsoletesystems to access and use original dataAdvantages Disadvantages• Look and feel preserved • Complex development: may need to• Potential to solve access issues emulate HW, OS, applications … once and for all • Technology a moving target: need• No need to process original files many emulators to reflect changes • IP issues major barrier • Scalability, practicality not proven • Is the emulation right?
  17. 17. Preferred Approaches Share Basic Ideas • No optimal system; iterative improvements will continue • Keep the original files • Active management essential – Move data to new storage media ~5 years – Monitor data integrity with fixity checks – Ensure data remains accessible and interpretable • Make multiple copies and store separately • Modular approach to tools and services • Watch for changes in technology and user expectations
  18. 18. Preferred Approaches Are Open• Open architectures: – Allows adding, upgrading and swapping system components from different vendors and sources – Essential not to be locked into one approach: must be able to easily move data to new platform – Systems should support interoperability• Open Standards: – Published, widely used, consensus based – Can include open source or commercial products – Key is transparent understanding of technical basis to enable data access, manipulation
  19. 19. Important Non-Technical Issues• Collaboration: new models needed for institutions, communities to work together• Institutional culture: new policies, leaders need to integrate analog and digital management, staff need new skills• Cost: many variables; economic sustainability is an issue
  20. 20. Copyrighted, Private, Confidential• Exceptions in U. S. Copyright law for libraries & archives are outdated – 3 copy limit• Societal norms and expectations for privacy are shifting – especially on the Internet• Data mining and other techniques allow for new kinds of access and new policies – Social media, personal information
  21. 21. Digital Forensics• Tools and approaches for protecting and extracting digital information• Special relevance for all types of digital media, personal digital archiving• Basic principles: – Acquire evidence without alteration – Do work in accountable, repeatable way
  22. 22. Work with Current & Archaic Data• Must handle current digital information from mobile devices, networks, live data on remote computers, flash media, virtual machines, cloud services and encrypted sources• Also deal with older information on all imaginable media—8” floppy disks, punch cards, ancient hard drives• Everything to do with computing is either obsolete or rapidly headed that way
  23. 23. Personal Archiving• “Personal papers” increasingly digital• Social media, web largely driven by personal creation• Personal content characterized by highly inconsistent structures, formats, provenance• High risk of incompleteness, questionable authenticity
  24. 24. Forensic Life Cycle (Partial)• Securing and Evaluating the Scene: ensure safety, confirm computer equipment present, secure equipment, identify and protect evidence, conduct interviews• Documenting the Scene: create a permanent record of the scene by means of photography and note taking, document condition and location of computers• Evidence Collection: collect computer hardware and media while preserving evidential value, obtain analogue evidence such as passwords, handwritten notes, computer manuals, printouts• Forensic Imaging and Copying: e.g. for hard drive – removal of physical disk from computer, digital preview and capture using physical or logical disk acquisition, with writeblockers, followed by return of original media to evidence custodianSource: Digital Forensics and Preservation, DPC Technology Watch Report 12-03
  25. 25. Summary• Digital information presents tough issues in terms of preservation and access• Libraries and archives must address these issues even though there are no ideal solutions and some open questions• Initiatives are underway around the world testing different approaches to preservation• There are a number of significant non-technical issues• Digital preservation is also relevant on the personal level; digital forensics is an emerging sub-specialty
  26. 26. For More Information: A Partial List• Digital preservation: an introduction, UKLON, http://ow.ly/hpoWr• An Introduction to Digital Preservation, JISC Digital Media,http://ow.ly/hpp7A• Curation Reference Manual, Digital Curation Centre, http://ow.ly/hppeR• Digital Preservation Handbook, Digital Preservation Coalition,http://ow.ly/hppk2• Digital Preservation Management Tutorial, Inter-university Consortium forPolitical and Social Research, University of Michigan, http://ow.ly/hpprU• Harnessing the Power of Digital Data for Science and Society, Report of theInteragency Working Group on Digital Data to the Committee on Science ofthe National Science and Technology Council, http://ow.ly/hppxC• International Study on the Impact of Copyright Law on Digital Preservation,Library of Congress, JISC, OAK Law, SURFfoundation, http://ow.ly/hppBs• National Digital Information Infrastructure and Preservation Program,Library of Congress, http://ow.ly/hppHP
  27. 27. For More Information: A Partial List-2• LIFE3: A Predictive Costing Tool For Digital Collections, Life Cycle Informationfor E Literature, University College London Library Services and the BritishLibrary, http://ow.ly/hpoI7• Open Planets Foundation, http://ow.ly/htqEw• Preserving Moving Pictures and Sound, DPC Technology Watch Report 12-01March 2012, http://ow.ly/hoYQx• Digital Forensics and Preservation, DPC Technology Watch Report 12-03November 2012, http://ow.ly/hoZiW• Digital Forensics and Born Digital Content in Cultural Heritage Collections,http://ow.ly/hpnn3•Library of Congress digital preservation blog, The Signal, http://ow.ly/hpq0F• National Digital Stewardship Alliance, Digital Preservation Glossary,http://ow.ly/hua7X

×