Electronic Records


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We (Tim, Jackie myself) working with Duke & Michigan State & U. Michigan on a SPEC kit for electronic records, should follow up on this information
  • We deal with Penn State records and the records of outside individuals and organizations
  • Bit rot - small electric charge of a bit in memory disperses, altering program code.Or, data gradually decaying over time from memory or physical media (CD/DVD)
  • Acquire was an actual board game about “high finance” invented by Sid Sackson in 1962For acquire – send a disk? Upload via a web interface?
  • Developed by an organization of Space Agencies (ala NASA), the Consultative Committee for Space Data SystemsSubmission Information Packages, Archival Information Packages, Dissemination Information Packages
  • Information packages contain content, packaged other information, like representation or preservation, and information about the packaging
  • Dublin Core: Title, Date, Creator, Format, Subject, RightsPREMIS: objects, rights, agents, eventXML – EAD is a subset
  • DDA: JavaChecksum: algorithm-generated 128-bit value that serves as a fingerprint of the file, can be used to check file integrityTRAC-OCLC & CRL (Center for Research Libraries) released criteria & checklist in 2007Drambora – UK, self-assessment, risks are ok (they’re unavoidable) – but identify and plan for themWAS-web archiving service, CDL
  • Recently read that at the NY State archives they have an initial “quarantine” machine, a processing machine, and a secure storage machine.
  • Could also try for emulationWill depend on the item/collection
  • Digital forensics can be required to find out how e-document alteredFor both, approaches will vary according to importance of collection & resources available
  • Future stuff:Retention periodsRoutine fixity checksObject versioning/difference viewsFormat migration tools Rights, access control, user accounts
  • Will be interesting to see how Hydra compares to CONTENTdm 6.0
  • Named after computer programmer Melvin Conway, 1968
  • Most of funding going to full-time IT position to build the preservation environmentExtracting records from creator’s databases
  • Electronic Records

    2. 2. TAKING OUR PULSE: THE OCLC RESEARCH SURVEY OF SPECIAL COLLECTIONS AND ARCHIVES [OCT 2010] 79% said they had born-digital material in their collections, Yet, only 35% could estimate the extent of those materials, and 45% weren’t sure who was responsible for this material. “Undercollected, undercounted, undermanaged, unpreserved, inaccessibl e” -Jackie Dooley of OCLC
    3. 3. WHAT ARE WE TALKING ABOUT? Records:  Information, in a fixed form, used as a source of information about the past  Records have content, structure & context Special Collections:  Primary Sources (“Material that contains firsthand accounts of events and that was created contemporaneous to those events or later recalled by an eyewitness.”) Examples of records in Special Collections:  Meeting minutes  Letters  Diaries  Author’s manuscripts
    4. 4. ELECTRONIC RECORDS Written on magnetic or optical medium, recorded in binary code, and accessed using computer software & hardware The Board of Trustees Meeting Minutes are online as PDFs People send letters via email People keep diaries via blogs Authors donate manuscripts on hard drives Recently an artist donated her website!
    5. 5. ELECTRONIC UNIVERSITY RECORDS For PSU records, we must adhere to a records schedule.  We must keep certain documentation for a certain amount of time, no matter its format. Some examples:  Faculty Senate Course Proposals  University Web Bulletin  Newswires  Central Policies & Procedures Manuals University Archivist must be able to reconstruct events/decisions/procedures While demonstrating authenticity, reliability, integrity
    6. 6. ELECTRONIC UNIVERSITY RECORDS – CASESTUDYThe head of an academic department is complaining to the Provost that he did not approve a course currently being taught by a new professor in his department.Course proposals must pass through 3 levels of approval. Course proposals are archived in digital format, and the three layers of approval are recorded through digital signatures.The Provost asks the University Archivist to retrieve the course proposal and verify that the department head signed off on it. The course proposal shows that indeed it went through all appropriate approvals. The University Archivist must make the case that the department heads (digital) signature is authentic.The University Archivist must also make sure that the version of the course proposal signed off on by the department head is the same version currently being taught.
    7. 7. P-RECORDS VS. E-RECORDS Both can take many forms Both can come to us quite messy For e-records:  More copies, decentralized  Authenticity can be harder to demonstrate  Privacy may be harder to guarantee  Less stable: viruses, accidental deletion, bit rot, formats become obsolete (remember floppy discs?)  However, they are more amenable to batch processing and automated searching We’re still talking archives
    8. 8. TRADITIONAL ARCHIVAL FUNCTIONSAppraise Acquire Who created it and why?  Records Schedule? What does it document?  Gift or Purchase? Who might use it?  Donor agreement? Does it serve our mission?  How to transport? Is it authentic? Is it rare/valuable? Physical condition? Privacy issues?
    9. 9. TRADITIONAL ARCHIVAL FUNCTIONS, CONT’DAccession Arrange & Describe Establish physical, administrative &  Original Order? intellectual control  Series? Survey for formats, extent  Sorting? and condition  To what level? Check for issues of privacy/confidentiality  Controlled vocabularies? Re-house? Preliminary description Document access restrictions Assign secure location Assign processing priority
    10. 10. TRADITIONAL ARCHIVAL FUNCTIONS, CONT’DPreserve Make Accessible Appropriate environment  Restrictions? Archival supplies  Onsite/online Security  Outreach Conservation (repair of individual items) considered separateAn E-records program will enable us to perform all these functions on all our e-records on an ongoing basis
    11. 11. WE NEED NEW: Staff Models Standards Tools Infrastructure/tech support Policies Workflows Partnerships & A positive attitude towards change
    12. 12. STAFF:WHAT SKILLS DOES A DIGITAL ARCHIVIST NEED? http://blogs.loc.gov/digitalpreservation/2011/07/what-skills-does- a-digital-archivist-or-librarian-need/  Knowledge of formats & standards, but also:  Adaptability, flexibility  Ability to bridge gap between techies and not-so-techies  Ability to communicate and advocate for what they do
    13. 13. MODELS: REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM (OAIS)An OAIS is the combination of systems and people necessary to preserveselected information over the long term and make it available for a“Designated Community”
    16. 16. MODELS: DIGITAL CURATION CENTRE - LIFECYCLE“Digital curation involves maintaining, preserving and adding value todigital research data throughout its lifecycle.”
    17. 17. STANDARDS UNICODE (character coding system for worldwide interchange of text) Dublin Core (basic set of metadata elements to enable cross-searching) PREMIS (metadata for preservation) XML (set of rules for encoding documents, emphasizing simplicity, generality and usability) PDF/a (open standard for document exchange, specialized for digital preservation) Etc. Cartoon by Rebecca Goldman derangementanddescription.wordpress.com
    18. 18. TOOLS Duke Data Accessioner  http://www.duke.edu/~ses44/downloads/guide.pdf  Copies data, using MD5 checksums  Droid & Jhove plug-ins – identify file formats  Creates XML wrapper Virus Scanner (Symantec) PII Scanner (Identity Finder) TRAC  Trusted Repositories Audit & Certification Drambora  Digital Repository Audit Method Based on Risk Assessment Archive-IT/WAS (hosted service solutions) Etc.
    19. 19. TECHNICAL INFRASTRUCTURE/SUPPORT Hardware: E-records workstation in secure location  PC with network access  PC with secure (“dark”) storage  Other equipment: Mac would be nice, additional media readers (floppy, zip) writeblocker. Automated backup/disaster recovery Discovery System
    20. 20. POLICIES Collection Development policies Service Level agreements  What kind of storage can we secure?  What kind of services will we offer prior to submission? Submission agreements  What file formats accepted  Ask for non-proprietary, non-lossy, widely adopted (Tiff, PDF)  What metadata required  How transferred (Web server? Physical disk?) Use agreements  Who can access what materials, when, how, and for what purposes?
    21. 21. WORKFLOWS“Accession” (traditional) “Ingest” (electronic) Survey for formats, extent and  Survey for file formats, extent condition (MB/GB/TB? Files/folders?) (papers, photos, maps, linear/cu  Scan for viruses bic feet, mold, insects)  Scan for PII  Run checksum, copy to new Check for issues of disc, verify checksum privacy/confidentiality  Preliminary metadata (correspondents, SSNs)  Document access restrictions Re-house? (Sensitive data? Need special Preliminary description software/hardware?) Document access restrictions  Move to secure digital storage (Certain groups? Donor  Assign processing priority permission? 50 year hold?) Assign secure location in stacks Assign processing priority
    22. 22. PRESERVATIONTraditional Digital Format usually bound  Format can depend on with content, stable encoding or applications  e.g. website with stylesheets, documents with MS formatting  So, what are we preserving? Just the information or also the “look and feel”?  Bitstream replication, system preservation?  Characterization: 25 bytes, 48 characters wide, “This is a video from YouTube”
    23. 23. PRESERVATION, CONTINUEDTraditional Digital Context/relationships  When lift bits from can often be physical media – how established through much context physical proximity or can/should you other cues include?  Metadata for relationships, hierarchie s
    24. 24. PRESERVATION CONTINUEDTraditional Digital Baseline preservation  Controlled environment accomplished by & proper storage and providing controlled handling of physical environment, proper media are storage & careful important, but all media handling. require periodic reformatting
    25. 25. PARTNERSHIPS: THE WAY FORWARD Not just I.T.! Need to partner with records creators - and their administrative support - early in their processes to encourage the use of archive-friendly formats and the production of good metadata!
    26. 26. CURATION ARCHITECTURE PROTOTYPE SERVICES(CAPS) Based on December 2009 platform review, which revealed inefficiencies & gaps, e.g. no platform for e-records Explored microservices approach to digital curation  Based on work by California Digital Library  Small, self-contained, independent services  Easier to develop, deploy, maintain, enhance, replace.  Interoperable: combine for more complex applications.“Small things...specialized jobs...only truly powerful when they work in concert...ZOMG ITS THE SMURFS” –Michael B. Klein
    27. 27. EXAMPLES OF MICROSERVICES Annotate - describe or catalog an object Authenticate - authenticate a user Authorize - authorize a user to access an object Characterize - generate administrative metadata for an object Identify - generate an identifier for an object Inventory - record an objects location on disk Relate - relate two or more objects Store - store an object on a filesystem Verify - check the integrity/checksum/fixity of an object Version - add a version to an object
    28. 28. WHO WAS DOING THIS EXPLORING? Representatives from:  Scholarly Communications  I-Tech  DLT  Special Collections  Cataloging & Metadata Stakeholders from 4 additional departments/ libraries:  Arts & Architecture, Digitization & Preservation, Maps, University Archives
    29. 29. PROCESS (OUTREACH & AGILITY) Daily meetings with core team Weekly meetings with stakeholders Constantly incorporating feedback into our work and reformulating long/short term goals Never “no” – just “not now” Progress tracked immediately on wiki Led to buy-in from stakeholders Developed prototype product in 3 months time
    31. 31. SCREENSHOT (1)
    32. 32. SCREENSHOT (2)
    33. 33. SCREENSHOT (3)
    34. 34. AIMS http://born-digital-archives.blogspot.com/ “Born Digital Collections: An Inter-institutional Model for Stewardship” UVA, Stanford, Yale, University of Hull Mellon Grant, 13 born-digital collections Framework for accessioning, arrangement & description, discovery Uses Hydra, an open-source technical framework available under Apache 2 license Principal platforms are Fedora, Blacklight, Solr, Ruby on Rails One body (digital repository) many heads (feature-rich asset management applications)
    35. 35. AIMS, CONTINUED One of AIMS’ Hydra “heads” is Hypatia Allows archivists to arrange and describe born-digital assets. Hypatia: Greek Philosopher in Includes the ability to: Roman Egypt, “first notable  Drag and drop to arrange, woman in mathematics”  Return to original order,  View file types,  Add descriptive metadata, and  Apply rights & permissions (high level of granularity)
    36. 36. ARCHIVEMATICA Artefactual Systems, City of Vancouver, University of British Columbia, Rockefeller Center& UNESCO Microservices design pattern “Integrated suite of open-source tools that allow users to process digital objects from ingest to access.” Based on Linux, written in Python. Utilizes METS, PREMIS, Dublin Core. Users monitor and control via a web-based dashboard. Implements media type preservation plans based on an analysis of the significant characteristics of file formats. Hydra fork: Rubymatica
    38. 38. POSITIVE ATTITUDE TOWARDS CHANGE… IT’S THE ONLY CONSTANTKeep up through:•Websites like http://www.dcc.ac.uk/•Listservs like [digital-curation]•Professional meetings (e.g. SAA, Open Repositories, Code4Lib, DLF)•Publications (e.g. American Archivist, Journal of Digital Information, FirstMonday, Digital Preservation Coalitions Technology Watch Reports)
    39. 39. FIVE ORGANIZATIONAL STAGES OF PROGRAM DEVELOPMENT: Acknowledge – that digital curation is a shared concern Act- initiate digital preservation projects Consolidate- segue from projects to programs Institutionalize- incorporate the larger environment Externalize - embrace inter-institutional collaboration
    40. 40. BEWARE CONWAYS LAW: Organizations produce designs that copy their communication structures (“Do not wait for a single, ultimate solution to emerge. The pieces of the puzzle are in place to build preservation environments” –DigCCurr 2009) Drawing by Rube Goldberg
    41. 41. TAKE THE LEAP Set up new DRA at e-records workstation (Equipped with tools for ingest and access to secure storage) Select a group of records Partner with the creators of those records Agree on some policies to get those records submitted (Policies should include standards) Have your new staff run the selected records through your new workflows using your new tools Document successes, failures Tweak, try again
    42. 42. SPARTAN ARCHIVE AT MICHIGAN STATE 3 year (began April 2010) NHPRC-funded ($250k) project Appraisal, accession & ingest, preservation & management, on-line access for 3 large series from Office of Registrar:  Catalog of Academic Programs  Course Descriptions  Annual Student Directory Utilizing Integrated Rule Oriented Data Systems (iRODS) distributed data grid solution Project will result in new policies, procedures, institutional metadata standards, definitions for SIPs, AIPs and DIPs