Preserving Content from Your Institutional Repository


Published on

Between institutional repositories and hosting journals, many libraries are becoming responsible for scholarly content in new ways. While PDFs are the most common format today, the unique, local, serial content may be in variety of formats. These items may be digitized text, born digital text, audio, video, or images. This presentation will discuss formats that will remain accessible through time (PDF/A, txt, xml) so that content is not locked in proprietary formats. It will also discuss options for backing up items and associated metadata, including simple back-ups, off-site storage of files, LOCKSS, Private LOCKSS Networks, and Portico. The presenters will offer suggestions for how to ensure your local content is being preserved properly.

Carol Ann Borchert
Coordinator for Serials, University of South Florida
Carol Ann Borchert has been the Coordinator for Serials at the University of South Florida (USF) since 2004. Previously, she was in the Reference and Government Documents departments at USF, and in several areas of the James B. Duke Library at Furman University. She holds an MLS from the University of Kentucky and an M.A. in Spanish from USF.

Wendy Robertson
University of Iowa
Wendy Robertson, Digital Scholarship Librarian has worked as a librarian at The University of Iowa Libraries since 2001. Her previous work positions include Electronic Resources Systems Librarian in Enterprise Applications, Electronic Resources Management Unit Head in Technical Services, and Electronic Resources Technical Services Librarian in Serials. She holds an MLS from The University of Iowa.

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Platforms: Digital Commons/bepress, OJS, CONTENTdm, DSpace, Fedora, Eprints, other?
  • Bit rot can become a problem—how to handle?Refreshing—transferring data between two types of the same storage medium, particularly storage mediums that deteriorate like CD-ROMsMigration—transfer data to new system environment, convert from one file format or operating system to anotherEmulating—emulates obsolete software platform, imitates old operating systemReplication—duplicate copies of data in one or more storage locationsValidating data integrity—fixity checking; systematically checks data to make sure there’s been no bit rot and that data has not changed/deterioratedMetadata—information on content and creation of file, preservation history, etc.; technical metadata—identifies file characteristicsLOCKSS uses a combo of these methods: copies (replication) are checked against each other (validating data integrity) to make sure they still match and there’s been no data degradation
  • --Notify LOCKSS that journal is available for preservation, and have IR company/platform work with LOCKSS to allow access to the content for preservation--Light archive; if site goes down, it will be available through LOCKSS quickly--If you make major changes with metadata or files, notify them before making the alterations--Not necessarily all serials appropriate for Global LOCKSS (newsletters, material that is quickly outdated or superseded)
  • Access vs. preservation copy—sometimes a smaller version of the file is in the IR as an accessible version, with a larger version of the file kept elsewhere as a preservation copy. This came up in the DC PLN discussion.
  • --Portico has an online form where you can recommend OA journals for them to include, or contact them directly for guidance--must have a trigger event to release content When a publisher ceases operations and titles are no longer available from any other sourceWhen a publisher ceases to publish and offer a title and it is not offered by another publisher or entityWhen back issues are removed from a publisher’s offering and are not available elsewhereUpon catastrophic failure by a publisher’s delivery platform for a sustained period of time
  • We’ve laid some groundwork on the subject, and this slide could be a whole presentation on its own, but just to get you thinking, here are factors to consider
  • Ties back to organizational and financial commitment; bookended this discussion with these two slides for a reason.
  • Preserving Content from Your Institutional Repository

    1. 1. Preserving Content fromYour InstitutionalRepositoryWendy C Robertson and Carol Ann BorchertNASIG, Buffalo, N.Y., June 8 2013This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
    2. 2. The Signal
    3. 3. “”a permanent, institution-wide repository ofdiverse, locally produced digital works (e.g.,article preprints and postprints, data sets,electronic theses and dissertations, learningobjects, and technical reports) that is available forpublic use and supports metadata harvesting.University of Houston Libraries, Institutional Repository Task Force. Institutional Repositories. SPECKit 292. July 2006. p.13An institutional repository is…
    4. 4. An institutional repository is not…Most IRs currently are not preservationrepositories; they do not meet all the criteriain Trustworthy Repositories Audit &Certification (TRAC) or other audits.
    5. 5. 10 basic characteristics of digitalpreservation repositories (CRL)1. The repository commits to continuing maintenance ofdigital objects for identified community/communities.2. Demonstrates organizational fitness (includingfinancial, staffing, and processes) to fulfill itscommitment.3. Acquires and maintains requisite contractual and legalrights and fulfills responsibilities.4. Has an effective and efficient policy framework.
    6. 6. 10 basic characteristics (cont.)5. Acquires and ingests digital objects based upon statedcriteria that correspond to its commitments andcapabilities.6. Maintains/ensures the integrity, authenticity andusability of digital objects it holds over time.
    7. 7. 10 basic characteristics (cont.)7. Creates and maintains requisite metadata aboutactions taken on digital objects during preservation aswell as about the relevant production, access support,and usage process contexts before preservation.8. Fulfills requisite dissemination requirements.9. Has a strategic program for preservation planning andaction.10.Has technical infrastructure adequate to continuingmaintenance and security of its digital objects.
    8. 8. The year is 2100. Can you read your files?
    9. 9. Our questions for you• Who has an IR?• What platform are you using?• Who’s backing it up?• Who’s part of a PLN?• Who’s having their IR journalspreserved in LOCKSS or Portico?Question mark sign by Colin_K, on Flickr
    10. 10. Localized disasters
    11. 11. FireFlood
    12. 12. WarTsunamiEarthquake© 2011 UMD Libraries
    13. 13. Disasters with warningMoving servers outof the University ofIowa Libraries, 2008.© 2008 The University of Iowa
    14. 14. Disasters with no warningUniversity of SouthFlorida, verylocalized flood
    15. 15. “”Disaster recovery strategies and backupsystems are not sufficient to ensure survivaland access to authentic digital resources overtime. A backup is a short-term data recoverysolution following loss or corruption andis fundamentally different to an electronicpreservation archive.JISC. Digital Preservation: Continued Access to Authentic Digital Assets(November 2006)Backups vs. preservation
    16. 16. Exit strategyMake sure you can easily migrate all yourcontent and metadata out of your system in ausable format.
    17. 17. Test, test and test some moreTest that all files are as expected regardingstructure and completeness.
    18. 18. Persistent identifiersUsing persistent identifiers now will help ifyou move to a new repository in the future.
    19. 19. Preserving the WebYou may want archiveinstitutional contentthat is notappropriate for an IRbut which isappropriate for thelibrary’s mission.
    20. 20. Archive-ItArchive-It canpreserve journalsand otherscholarly workfrom yourinstitution thatdoesn’t go intoyour repository.
    21. 21. Internet Archive“The Montana State Library(MSL) last year moved acopy of its collection of3000 born digital statepublications to the InternetArchive (IA).”—ChrisStockwell for Montana StateLibrary, 12/29/2010
    22. 22. IRs are a bit different…The copy of the document in the repositoryoften is the only version you have.
    23. 23. Access copy vs. preservation copyDigitized content may have a preservationscan as well as the version which displays tothe public.
    24. 24. IRs have special problems…Automatically adding a cover page to brandand identify content has change the file,perhaps even removing accessibility features.
    25. 25. File formatsWhen possible, use open file formats socontent will remain accessible long into thefuture, but will you turn down content inother formats?
    26. 26. PDF/A (ISO 19005-1:2005)PDF/A is an ISO standard“which provides amechanism forrepresenting electronicdocuments in a mannerthat preserves theirvisual appearance overtime, independent ofthe tools and systemsfor creating or rendingthe files.”
    27. 27. U Iowa electronic theses & dissertations1931 PDFs and 7 XML documentsSupplemented by:21 .avi1 .avp8 .doc2 .mov2 .mp31 .mp44 .mpg1 .mxf3 .NTS2 .pde6 .pdf4 .txt3 .wmv18 .xls2 .zip
    28. 28. Public preservation policyMake yourpreservation andsubmission policyclear so thatcontributorsunderstand therisks ofcontributing a non-open format.
    29. 29. Preservation metadataPREMIS (PREservation MetadataImplementation Strategies)“Preservation metadata supportsactivities intended to ensure thelong-term usability of a digitalresource.”—Caplan, p.3
    30. 30. “”Metadata can help support authenticity bydocumenting the digital provenanceof the resource — its chain of custody andauthorized change history.Caplan, Priscilla. Understanding PREMIS. Library of Congress, ©2009. p.3Digital provenance
    31. 31. Methods of preserving data• Refreshing data• Migrating data• Emulating software platform• Replicating• Validating data integrity• Metadata
    32. 32. Long-term preservation options• Global LOCKSS Network• Private LOCKSS Network• Portico
    33. 33. Global LOCKSS Network• For e-journal content• Preserves the format as well as the content• Light archive• Adding journals to LOCKSS• Notify LOCKSS of metadata/file changes• Not all serials are appropriate for GlobalLOCKSS
    34. 34. Private LOCKSS Network• All material from the IR• Need at least 7 nodes/destinations• Each should be a LOCKSS Alliance member• Set up policies and governance for the PLN
    35. 35. Setting up policies for a PLN• How long is initialcommitment?• How much notice towithdraw?• How do members removedata for withdrawninstitution?• Does the group need agoverning body or steeringcommittee?• Will the PLN be a dark orlight archive?• Do any of the membershave embargoedmaterials?
    36. 36. Examples of PLNs
    37. 37. Portico• For e-books and e-journals• Source files converted to an archiveformat• Dark archive• Portico is responsible for future contentmigrations• Adding journals to Portico• Not all serials are appropriate for Portico
    38. 38. Factors to consider in developing a formalpreservation plan• Organizational &financial commitment• Stakeholders• Local backups vs. long-term preservation• Storage needs• Roles & responsibilities• Data ingestion• Policy on deletion of orembargoes for materials• Funding• Staff
    39. 39. Organizational & financial commitment•What is the long-term financial commitmentfrom your library or institution?•Do you have the support of the organization?From what level of administration?
    40. 40. Stakeholders•Producers•Users•Owners•Managers•Funding authorities•Other parties?
    41. 41. Local backups vs. long-term preservation•Definition of backups versus preservation•Metadata, content, software, or all of these?•How often and who is responsible?•PLN or other option for long-term preservation
    42. 42. Storage needsDisk space How muchspace do youneed? Who isresponsible formaintainingdisks?Software Whichsoftware willbe required? Who migratesinformation assoftware needschange?Equipment Whatequipment willyou need? Who will fundthe equipment,set it up,maintain it?
    43. 43. Roles & responsibilities•Who is implementing the plan?•Who is maintaining the data and how?•Who is providing support for accessingmaterial and troubleshooting issues?
    44. 44. Data ingestion•How are you getting data into the systemfor preservation or backup?•Will this be done in-house or outsourced toa third party?•How frequently and in what format?
    45. 45. Funding vs. staffing• Is it easier to fund these efforts at your organization orstaff them?• How well-staffed is your organization?• What kind of expertise do you have (or not have) in thelibrary?• What level of commitment does your organization haveto preserve digital information?
    46. 46. Questions?Wendy RobertsonDigital Scholarship LibrarianUniversity of Iowa Carol Ann BorchertCoordinator for SerialsUniversity of South Florida
    47. 47. SourcesBall, Alex. Preservation and Curation in Institutional Repositories. DigitalCuration Centre, UKOLN, 2010. Version 1.3, Priscilla. Understanding PREMIS. Library of Congress, ©2009. Repository Audit Method Based On Risk Assessment (DRAMBORA). Glasgow,2009. Digital Preservation: Continued Access to Authentic Digital Assets (Nov.2006)
    48. 48. SourcesNestor Working Group. Catalogue of Criteria for Trusted Digital Repositories.Frankfurt am Main, Dec. 2006. Urn: de:0008-2006060703OpenDOAR Policies Tool., Alexandra. PDF/A in a Nutshell 2.0: PDF for long-term archiving. Berlin:Association for Digital Document Standards e. V., ©2013., Maureen. Web-Archiving. DPC Technology Watch Report 12-01 March2013. DOI: Model for an Open Archival Information System (OAIS). RecommendedPractice CCSDS 650.0-M-2. Magenta Book, June 2012.
    49. 49. SourcesTrustworthy Repositories Audit & Certification: Criteria and Checklist (TRAC).Version 1.0. Feb 2007. of Houston Libraries, Institutional Repository Task Force. InstitutionalRepositories. SPEC Kit 292. July 2006. of Illinois at Urbana-Champaign. “IDEALS Digital Preservation SupportPolicy.” ©2013 of Illinois at Urbana-Champaign. “Preparing Items for Deposit intoIDEALS. File Format Recommendations” ©2013