Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Electronic Records
1. BUILDING AN ELECTRONIC RECORDS PROGRAM
IN A MAJOR RESEARCH LIBRARY
SPECIAL COLLECTIONS:
POTENTIAL PATHS TO SUCCESS
Michelle Belden
January 2012
2. TAKING OUR PULSE:
THE OCLC RESEARCH SURVEY
OF SPECIAL COLLECTIONS AND ARCHIVES [OCT 2010]
79% said they had born-digital material in their collections,
Yet, only 35% could estimate the extent of those materials, and
45% weren’t sure who was responsible for this material.
“Undercollected, undercounted, undermanaged, unpreserved, inaccessibl
e” -Jackie Dooley of OCLC
3. WHAT ARE WE TALKING ABOUT?
Records:
Information, in a fixed form, used as a source of information
about the past
Records have content, structure & context
Special Collections:
Primary Sources (“Material that contains firsthand accounts of
events and that was created contemporaneous to those events
or later recalled by an eyewitness.”)
Examples of records in Special Collections:
Meeting minutes
Letters
Diaries
Author’s manuscripts
4. ELECTRONIC RECORDS
Written on magnetic or optical medium, recorded in binary
code, and accessed using computer software & hardware
The Board of Trustees Meeting Minutes are online as PDFs
People send letters via email
People keep diaries via blogs
Authors donate manuscripts on hard drives
Recently an artist donated her website!
5. ELECTRONIC UNIVERSITY RECORDS
For PSU records, we must adhere to a records schedule.
We must keep certain documentation for a certain amount of
time, no matter its format.
Some examples:
Faculty Senate Course Proposals
University Web Bulletin
Newswires
Central Policies & Procedures Manuals
University Archivist must be able to reconstruct
events/decisions/procedures
While demonstrating authenticity, reliability, integrity
6. ELECTRONIC UNIVERSITY RECORDS – CASE
STUDY
The head of an academic department is complaining to the
Provost that he did not approve a course currently being
taught by a new professor in his department.
Course proposals must pass through 3 levels of approval.
Course proposals are archived in digital format, and the three
layers of approval are recorded through digital signatures.
The Provost asks the University Archivist to retrieve the course
proposal and verify that the department head signed off on it.
The course proposal shows that indeed it went through all
appropriate approvals. The University Archivist must make the
case that the department head's (digital) signature is
authentic.
The University Archivist must also make sure that the version of
the course proposal signed off on by the department head is
the same version currently being taught.
7. P-RECORDS VS. E-RECORDS
Both can take many forms
Both can come to us quite messy
For e-records:
More copies, decentralized
Authenticity can be harder to demonstrate
Privacy may be harder to guarantee
Less stable: viruses, accidental deletion, bit rot, formats
become obsolete (remember floppy discs?)
However, they are more amenable to batch processing
and automated searching
We’re still talking archives
8. TRADITIONAL ARCHIVAL FUNCTIONS
Appraise Acquire
Who created it and why? Records Schedule?
What does it document? Gift or Purchase?
Who might use it? Donor agreement?
Does it serve our mission? How to transport?
Is it authentic?
Is it rare/valuable?
Physical condition?
Privacy issues?
9. TRADITIONAL ARCHIVAL FUNCTIONS, CONT’D
Accession Arrange & Describe
Establish
physical, administrative & Original Order?
intellectual control
Series?
Survey for formats, extent
Sorting?
and condition
To what level?
Check for issues of
privacy/confidentiality Controlled vocabularies?
Re-house?
Preliminary description
Document access
restrictions
Assign secure location
Assign processing priority
10. TRADITIONAL ARCHIVAL FUNCTIONS, CONT’D
Preserve Make Accessible
Appropriate environment Restrictions?
Archival supplies Onsite/online
Security Outreach
Conservation (repair of
individual items)
considered separate
An E-records program
will enable us to perform all these functions
on all our e-records
on an ongoing basis
11. WE NEED NEW:
Staff
Models
Standards
Tools
Infrastructure/tech support
Policies
Workflows
Partnerships
& A positive attitude towards change
12. STAFF:
WHAT SKILLS DOES A DIGITAL ARCHIVIST NEED?
http://blogs.loc.gov/digitalpreservation/2011/07/what-skills-does-
a-digital-archivist-or-librarian-need/
Knowledge of formats &
standards, but also:
Adaptability, flexibility
Ability to bridge gap between
techies and not-so-techies
Ability to communicate and
advocate for what they do
13. MODELS: REFERENCE MODEL FOR AN OPEN
ARCHIVAL INFORMATION SYSTEM (OAIS)
An OAIS is the combination of systems and people necessary to preserve
selected information over the long term and make it available for a
“Designated Community”
16. MODELS: DIGITAL CURATION CENTRE - LIFECYCLE
“Digital curation involves maintaining, preserving and adding value to
digital research data throughout its lifecycle.”
17. STANDARDS
UNICODE
(character coding system for worldwide interchange of text)
Dublin Core
(basic set of metadata elements to enable cross-searching)
PREMIS
(metadata for preservation)
XML
(set of rules for encoding documents, emphasizing
simplicity, generality and usability)
PDF/a
(open standard for document exchange, specialized for digital
preservation)
Etc.
Cartoon by Rebecca Goldman
derangementanddescription.wordpress.com
18. TOOLS
Duke Data Accessioner
http://www.duke.edu/~ses44/downloads/guide.pdf
Copies data, using MD5 checksums
Droid & Jhove plug-ins – identify file formats
Creates XML wrapper
Virus Scanner (Symantec)
PII Scanner (Identity Finder)
TRAC
Trusted Repositories Audit & Certification
Drambora
Digital Repository Audit Method Based on Risk Assessment
Archive-IT/WAS (hosted service solutions)
Etc.
19. TECHNICAL INFRASTRUCTURE/SUPPORT
Hardware: E-records workstation in secure location
PC with network access
PC with secure (“dark”) storage
Other equipment: Mac would be nice, additional media
readers (floppy, zip) writeblocker.
Automated backup/disaster recovery
Discovery System
20. POLICIES
Collection Development policies
Service Level agreements
What kind of storage can we secure?
What kind of services will we offer prior to submission?
Submission agreements
What file formats accepted
Ask for non-proprietary, non-lossy, widely adopted (Tiff, PDF)
What metadata required
How transferred (Web server? Physical disk?)
Use agreements
Who can access what materials, when, how, and for
what purposes?
21. WORKFLOWS
“Accession” (traditional) “Ingest” (electronic)
Survey for formats, extent and Survey for file formats, extent
condition (MB/GB/TB? Files/folders?)
(papers, photos, maps, linear/cu Scan for viruses
bic feet, mold, insects) Scan for PII
Run checksum, copy to new
Check for issues of disc, verify checksum
privacy/confidentiality
Preliminary metadata
(correspondents, SSNs)
Document access restrictions
Re-house? (Sensitive data? Need special
Preliminary description software/hardware?)
Document access restrictions Move to secure digital storage
(Certain groups? Donor Assign processing priority
permission? 50 year hold?)
Assign secure location in stacks
Assign processing priority
22. PRESERVATION
Traditional Digital
Format usually bound Format can depend on
with content, stable encoding or applications
e.g. website with
stylesheets, documents with MS
formatting
So, what are we preserving?
Just the information or also the
“look and feel”?
Bitstream replication, system
preservation?
Characterization: 25 bytes, 48
characters wide, “This is a video
from YouTube”
23. PRESERVATION, CONTINUED
Traditional Digital
Context/relationships When lift bits from
can often be physical media – how
established through much context
physical proximity or can/should you
other cues include?
Metadata for
relationships, hierarchie
s
24. PRESERVATION CONTINUED
Traditional Digital
Baseline preservation Controlled environment
accomplished by & proper storage and
providing controlled handling of physical
environment, proper media are
storage & careful important, but all media
handling. require periodic
reformatting
25. PARTNERSHIPS: THE WAY FORWARD
Not just I.T.!
Need to partner with records creators - and their
administrative support - early in their processes to
encourage the use of archive-friendly formats and
the production of good metadata!
26. CURATION ARCHITECTURE PROTOTYPE SERVICES
(CAPS)
Based on December 2009 platform review, which revealed
inefficiencies & gaps, e.g. no platform for e-records
Explored microservices approach to digital curation
Based on work by California Digital Library
Small, self-contained, independent services
Easier to develop, deploy, maintain, enhance, replace.
Interoperable: combine for more complex applications.
“Small things...specialized jobs...only truly powerful when they work in
concert...ZOMG IT'S THE SMURFS” –Michael B. Klein
27. EXAMPLES OF MICROSERVICES
Annotate - describe or catalog an object
Authenticate - authenticate a user
Authorize - authorize a user to access an object
Characterize - generate administrative metadata for
an object
Identify - generate an identifier for an object
Inventory - record an object's location on disk
Relate - relate two or more objects
Store - store an object on a filesystem
Verify - check the integrity/checksum/fixity of an
object
Version - add a version to an object
28. WHO WAS DOING THIS EXPLORING?
Representatives from:
Scholarly Communications
I-Tech
DLT
Special Collections
Cataloging & Metadata
Stakeholders from 4 additional departments/ libraries:
Arts & Architecture, Digitization & Preservation, Maps, University Archives
29. PROCESS (OUTREACH & AGILITY)
Daily meetings with core team
Weekly meetings with stakeholders
Constantly incorporating feedback into our work
and reformulating long/short term goals
Never “no” – just “not now”
Progress tracked immediately on wiki
Led to buy-in from stakeholders
Developed prototype product in 3 months time
34. AIMS
http://born-digital-archives.blogspot.com/
“Born Digital Collections: An Inter-institutional Model for Stewardship”
UVA, Stanford, Yale, University of Hull
Mellon Grant, 13 born-digital collections
Framework for accessioning, arrangement & description, discovery
Uses Hydra, an open-source technical framework available under
Apache 2 license
Principal platforms are Fedora, Blacklight, Solr, Ruby on Rails
One body (digital repository) many heads (feature-rich asset
management applications)
35. AIMS, CONTINUED
One of AIMS’ Hydra “heads” is Hypatia
Allows archivists to arrange and describe
born-digital assets.
Hypatia: Greek Philosopher in
Includes the ability to: Roman Egypt, “first notable
Drag and drop to arrange, woman in mathematics”
Return to original order,
View file types,
Add descriptive metadata, and
Apply rights & permissions (high level of
granularity)
36. ARCHIVEMATICA
Artefactual Systems, City of Vancouver, University of British
Columbia, Rockefeller Center& UNESCO
Microservices design pattern
“Integrated suite of open-source tools that allow users to process digital
objects from ingest to access.”
Based on Linux, written in Python.
Utilizes METS, PREMIS, Dublin Core.
Users monitor and control via a web-based dashboard.
Implements media type preservation plans based on an analysis of the
significant characteristics of file formats.
Hydra fork: Rubymatica
38. POSITIVE ATTITUDE TOWARDS CHANGE…
IT’S THE ONLY CONSTANT
Keep up through:
•Websites like http://www.dcc.ac.uk/
•Listservs like [digital-curation]
•Professional meetings (e.g. SAA, Open Repositories, Code4Lib, DLF)
•Publications (e.g. American Archivist, Journal of Digital Information, First
Monday, Digital Preservation Coalition's Technology Watch Reports)
39. FIVE ORGANIZATIONAL STAGES
OF PROGRAM DEVELOPMENT:
Acknowledge – that digital curation is a shared concern
Act- initiate digital preservation projects
Consolidate- segue from projects to programs
Institutionalize- incorporate the larger environment
Externalize - embrace inter-institutional collaboration
40. BEWARE CONWAY'S LAW:
Organizations produce designs that copy their
communication structures
(“Do not wait for a single, ultimate solution to emerge. The
pieces of the puzzle are in place to build preservation
environments” –DigCCurr 2009)
Drawing by Rube Goldberg
41. TAKE THE LEAP
Set up new DRA at e-records workstation
(Equipped with tools for ingest and access to secure
storage)
Select a group of records
Partner with the creators of those records
Agree on some policies to get those records
submitted
(Policies should include standards)
Have your new staff run the selected records
through your new workflows using your new tools
Document successes, failures
Tweak, try again
42. SPARTAN ARCHIVE AT MICHIGAN STATE
3 year (began April 2010) NHPRC-funded ($250k) project
Appraisal, accession & ingest, preservation &
management, on-line access for 3 large series from Office
of Registrar:
Catalog of Academic Programs
Course Descriptions
Annual Student Directory
Utilizing Integrated Rule Oriented Data Systems (iRODS)
distributed data grid solution
Project will result in new policies, procedures, institutional
metadata standards, definitions for SIPs, AIPs and DIPs
Editor's Notes
We (Tim, Jackie myself) working with Duke & Michigan State & U. Michigan on a SPEC kit for electronic records, should follow up on this information
We deal with Penn State records and the records of outside individuals and organizations
Bit rot - small electric charge of a bit in memory disperses, altering program code.Or, data gradually decaying over time from memory or physical media (CD/DVD)
Acquire was an actual board game about “high finance” invented by Sid Sackson in 1962For acquire – send a disk? Upload via a web interface?
Developed by an organization of Space Agencies (ala NASA), the Consultative Committee for Space Data SystemsSubmission Information Packages, Archival Information Packages, Dissemination Information Packages
Information packages contain content, packaged other information, like representation or preservation, and information about the packaging
Dublin Core: Title, Date, Creator, Format, Subject, RightsPREMIS: objects, rights, agents, eventXML – EAD is a subset
DDA: JavaChecksum: algorithm-generated 128-bit value that serves as a fingerprint of the file, can be used to check file integrityTRAC-OCLC & CRL (Center for Research Libraries) released criteria & checklist in 2007Drambora – UK, self-assessment, risks are ok (they’re unavoidable) – but identify and plan for themWAS-web archiving service, CDL
Recently read that at the NY State archives they have an initial “quarantine” machine, a processing machine, and a secure storage machine.
Could also try for emulationWill depend on the item/collection
Digital forensics can be required to find out how e-document alteredFor both, approaches will vary according to importance of collection & resources available