We (Tim, Jackie myself) working with Duke & Michigan State & U. Michigan on a SPEC kit for electronic records, should follow up on this information
We deal with Penn State records and the records of outside individuals and organizations
Bit rot - small electric charge of a bit in memory disperses, altering program code.Or, data gradually decaying over time from memory or physical media (CD/DVD)
Acquire was an actual board game about “high finance” invented by Sid Sackson in 1962For acquire – send a disk? Upload via a web interface?
Developed by an organization of Space Agencies (ala NASA), the Consultative Committee for Space Data SystemsSubmission Information Packages, Archival Information Packages, Dissemination Information Packages
Information packages contain content, packaged other information, like representation or preservation, and information about the packaging
Dublin Core: Title, Date, Creator, Format, Subject, RightsPREMIS: objects, rights, agents, eventXML – EAD is a subset
DDA: JavaChecksum: algorithm-generated 128-bit value that serves as a fingerprint of the file, can be used to check file integrityTRAC-OCLC & CRL (Center for Research Libraries) released criteria & checklist in 2007Drambora – UK, self-assessment, risks are ok (they’re unavoidable) – but identify and plan for themWAS-web archiving service, CDL
Recently read that at the NY State archives they have an initial “quarantine” machine, a processing machine, and a secure storage machine.
Could also try for emulationWill depend on the item/collection
Digital forensics can be required to find out how e-document alteredFor both, approaches will vary according to importance of collection & resources available
Will be interesting to see how Hydra compares to CONTENTdm 6.0
Named after computer programmer Melvin Conway, 1968
Most of funding going to full-time IT position to build the preservation environmentExtracting records from creator’s databases
BUILDING AN ELECTRONIC RECORDS PROGRAMIN A MAJOR RESEARCH LIBRARYSPECIAL COLLECTIONS:POTENTIAL PATHS TO SUCCESS Michelle Belden January 2012
TAKING OUR PULSE: THE OCLC RESEARCH SURVEY OF SPECIAL COLLECTIONS AND ARCHIVES [OCT 2010] 79% said they had born-digital material in their collections, Yet, only 35% could estimate the extent of those materials, and 45% weren’t sure who was responsible for this material. “Undercollected, undercounted, undermanaged, unpreserved, inaccessibl e” -Jackie Dooley of OCLC
WHAT ARE WE TALKING ABOUT? Records: Information, in a fixed form, used as a source of information about the past Records have content, structure & context Special Collections: Primary Sources (“Material that contains firsthand accounts of events and that was created contemporaneous to those events or later recalled by an eyewitness.”) Examples of records in Special Collections: Meeting minutes Letters Diaries Author’s manuscripts
ELECTRONIC RECORDS Written on magnetic or optical medium, recorded in binary code, and accessed using computer software & hardware The Board of Trustees Meeting Minutes are online as PDFs People send letters via email People keep diaries via blogs Authors donate manuscripts on hard drives Recently an artist donated her website!
ELECTRONIC UNIVERSITY RECORDS For PSU records, we must adhere to a records schedule. We must keep certain documentation for a certain amount of time, no matter its format. Some examples: Faculty Senate Course Proposals University Web Bulletin Newswires Central Policies & Procedures Manuals University Archivist must be able to reconstruct events/decisions/procedures While demonstrating authenticity, reliability, integrity
ELECTRONIC UNIVERSITY RECORDS – CASESTUDYThe head of an academic department is complaining to the Provost that he did not approve a course currently being taught by a new professor in his department.Course proposals must pass through 3 levels of approval. Course proposals are archived in digital format, and the three layers of approval are recorded through digital signatures.The Provost asks the University Archivist to retrieve the course proposal and verify that the department head signed off on it. The course proposal shows that indeed it went through all appropriate approvals. The University Archivist must make the case that the department heads (digital) signature is authentic.The University Archivist must also make sure that the version of the course proposal signed off on by the department head is the same version currently being taught.
P-RECORDS VS. E-RECORDS Both can take many forms Both can come to us quite messy For e-records: More copies, decentralized Authenticity can be harder to demonstrate Privacy may be harder to guarantee Less stable: viruses, accidental deletion, bit rot, formats become obsolete (remember floppy discs?) However, they are more amenable to batch processing and automated searching We’re still talking archives
TRADITIONAL ARCHIVAL FUNCTIONSAppraise Acquire Who created it and why? Records Schedule? What does it document? Gift or Purchase? Who might use it? Donor agreement? Does it serve our mission? How to transport? Is it authentic? Is it rare/valuable? Physical condition? Privacy issues?
TRADITIONAL ARCHIVAL FUNCTIONS, CONT’DAccession Arrange & Describe Establish physical, administrative & Original Order? intellectual control Series? Survey for formats, extent Sorting? and condition To what level? Check for issues of privacy/confidentiality Controlled vocabularies? Re-house? Preliminary description Document access restrictions Assign secure location Assign processing priority
TRADITIONAL ARCHIVAL FUNCTIONS, CONT’DPreserve Make Accessible Appropriate environment Restrictions? Archival supplies Onsite/online Security Outreach Conservation (repair of individual items) considered separateAn E-records program will enable us to perform all these functions on all our e-records on an ongoing basis
WE NEED NEW: Staff Models Standards Tools Infrastructure/tech support Policies Workflows Partnerships & A positive attitude towards change
STAFF:WHAT SKILLS DOES A DIGITAL ARCHIVIST NEED? http://blogs.loc.gov/digitalpreservation/2011/07/what-skills-does- a-digital-archivist-or-librarian-need/ Knowledge of formats & standards, but also: Adaptability, flexibility Ability to bridge gap between techies and not-so-techies Ability to communicate and advocate for what they do
MODELS: REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM (OAIS)An OAIS is the combination of systems and people necessary to preserveselected information over the long term and make it available for a“Designated Community”
MODELS: DIGITAL CURATION CENTRE - LIFECYCLE“Digital curation involves maintaining, preserving and adding value todigital research data throughout its lifecycle.”
STANDARDS UNICODE (character coding system for worldwide interchange of text) Dublin Core (basic set of metadata elements to enable cross-searching) PREMIS (metadata for preservation) XML (set of rules for encoding documents, emphasizing simplicity, generality and usability) PDF/a (open standard for document exchange, specialized for digital preservation) Etc. Cartoon by Rebecca Goldman derangementanddescription.wordpress.com
TOOLS Duke Data Accessioner http://www.duke.edu/~ses44/downloads/guide.pdf Copies data, using MD5 checksums Droid & Jhove plug-ins – identify file formats Creates XML wrapper Virus Scanner (Symantec) PII Scanner (Identity Finder) TRAC Trusted Repositories Audit & Certification Drambora Digital Repository Audit Method Based on Risk Assessment Archive-IT/WAS (hosted service solutions) Etc.
TECHNICAL INFRASTRUCTURE/SUPPORT Hardware: E-records workstation in secure location PC with network access PC with secure (“dark”) storage Other equipment: Mac would be nice, additional media readers (floppy, zip) writeblocker. Automated backup/disaster recovery Discovery System
POLICIES Collection Development policies Service Level agreements What kind of storage can we secure? What kind of services will we offer prior to submission? Submission agreements What file formats accepted Ask for non-proprietary, non-lossy, widely adopted (Tiff, PDF) What metadata required How transferred (Web server? Physical disk?) Use agreements Who can access what materials, when, how, and for what purposes?
WORKFLOWS“Accession” (traditional) “Ingest” (electronic) Survey for formats, extent and Survey for file formats, extent condition (MB/GB/TB? Files/folders?) (papers, photos, maps, linear/cu Scan for viruses bic feet, mold, insects) Scan for PII Run checksum, copy to new Check for issues of disc, verify checksum privacy/confidentiality Preliminary metadata (correspondents, SSNs) Document access restrictions Re-house? (Sensitive data? Need special Preliminary description software/hardware?) Document access restrictions Move to secure digital storage (Certain groups? Donor Assign processing priority permission? 50 year hold?) Assign secure location in stacks Assign processing priority
PRESERVATIONTraditional Digital Format usually bound Format can depend on with content, stable encoding or applications e.g. website with stylesheets, documents with MS formatting So, what are we preserving? Just the information or also the “look and feel”? Bitstream replication, system preservation? Characterization: 25 bytes, 48 characters wide, “This is a video from YouTube”
PRESERVATION, CONTINUEDTraditional Digital Context/relationships When lift bits from can often be physical media – how established through much context physical proximity or can/should you other cues include? Metadata for relationships, hierarchie s
PRESERVATION CONTINUEDTraditional Digital Baseline preservation Controlled environment accomplished by & proper storage and providing controlled handling of physical environment, proper media are storage & careful important, but all media handling. require periodic reformatting
PARTNERSHIPS: THE WAY FORWARD Not just I.T.! Need to partner with records creators - and their administrative support - early in their processes to encourage the use of archive-friendly formats and the production of good metadata!
CURATION ARCHITECTURE PROTOTYPE SERVICES(CAPS) Based on December 2009 platform review, which revealed inefficiencies & gaps, e.g. no platform for e-records Explored microservices approach to digital curation Based on work by California Digital Library Small, self-contained, independent services Easier to develop, deploy, maintain, enhance, replace. Interoperable: combine for more complex applications.“Small things...specialized jobs...only truly powerful when they work in concert...ZOMG ITS THE SMURFS” –Michael B. Klein
EXAMPLES OF MICROSERVICES Annotate - describe or catalog an object Authenticate - authenticate a user Authorize - authorize a user to access an object Characterize - generate administrative metadata for an object Identify - generate an identifier for an object Inventory - record an objects location on disk Relate - relate two or more objects Store - store an object on a filesystem Verify - check the integrity/checksum/fixity of an object Version - add a version to an object
WHO WAS DOING THIS EXPLORING? Representatives from: Scholarly Communications I-Tech DLT Special Collections Cataloging & Metadata Stakeholders from 4 additional departments/ libraries: Arts & Architecture, Digitization & Preservation, Maps, University Archives
PROCESS (OUTREACH & AGILITY) Daily meetings with core team Weekly meetings with stakeholders Constantly incorporating feedback into our work and reformulating long/short term goals Never “no” – just “not now” Progress tracked immediately on wiki Led to buy-in from stakeholders Developed prototype product in 3 months time
AIMS http://born-digital-archives.blogspot.com/ “Born Digital Collections: An Inter-institutional Model for Stewardship” UVA, Stanford, Yale, University of Hull Mellon Grant, 13 born-digital collections Framework for accessioning, arrangement & description, discovery Uses Hydra, an open-source technical framework available under Apache 2 license Principal platforms are Fedora, Blacklight, Solr, Ruby on Rails One body (digital repository) many heads (feature-rich asset management applications)
AIMS, CONTINUED One of AIMS’ Hydra “heads” is Hypatia Allows archivists to arrange and describe born-digital assets. Hypatia: Greek Philosopher in Includes the ability to: Roman Egypt, “first notable Drag and drop to arrange, woman in mathematics” Return to original order, View file types, Add descriptive metadata, and Apply rights & permissions (high level of granularity)
ARCHIVEMATICA Artefactual Systems, City of Vancouver, University of British Columbia, Rockefeller Center& UNESCO Microservices design pattern “Integrated suite of open-source tools that allow users to process digital objects from ingest to access.” Based on Linux, written in Python. Utilizes METS, PREMIS, Dublin Core. Users monitor and control via a web-based dashboard. Implements media type preservation plans based on an analysis of the significant characteristics of file formats. Hydra fork: Rubymatica
POSITIVE ATTITUDE TOWARDS CHANGE… IT’S THE ONLY CONSTANTKeep up through:•Websites like http://www.dcc.ac.uk/•Listservs like [digital-curation]•Professional meetings (e.g. SAA, Open Repositories, Code4Lib, DLF)•Publications (e.g. American Archivist, Journal of Digital Information, FirstMonday, Digital Preservation Coalitions Technology Watch Reports)
FIVE ORGANIZATIONAL STAGES OF PROGRAM DEVELOPMENT: Acknowledge – that digital curation is a shared concern Act- initiate digital preservation projects Consolidate- segue from projects to programs Institutionalize- incorporate the larger environment Externalize - embrace inter-institutional collaboration
BEWARE CONWAYS LAW: Organizations produce designs that copy their communication structures (“Do not wait for a single, ultimate solution to emerge. The pieces of the puzzle are in place to build preservation environments” –DigCCurr 2009) Drawing by Rube Goldberg
TAKE THE LEAP Set up new DRA at e-records workstation (Equipped with tools for ingest and access to secure storage) Select a group of records Partner with the creators of those records Agree on some policies to get those records submitted (Policies should include standards) Have your new staff run the selected records through your new workflows using your new tools Document successes, failures Tweak, try again
SPARTAN ARCHIVE AT MICHIGAN STATE 3 year (began April 2010) NHPRC-funded ($250k) project Appraisal, accession & ingest, preservation & management, on-line access for 3 large series from Office of Registrar: Catalog of Academic Programs Course Descriptions Annual Student Directory Utilizing Integrated Rule Oriented Data Systems (iRODS) distributed data grid solution Project will result in new policies, procedures, institutional metadata standards, definitions for SIPs, AIPs and DIPs