Pekin eca2010-v2


Published on

Gareth Knight's presentation at ECA2010 conference, 28-30 April 2010

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • I will present a case study of work being performed King’s College London case study to revise records management strategy to cater for current needs Themes: Re-appraise definition & criteria for a Record Challenges posed when attempting to archive digital Records Technical architecture and processes that may be developed to curate digital records
  • Introduce record and contrast to information
  • Key themes of a record
  • Introduce notion that criteria for definition of a record is influenced by external factors and therefore subject to change over time Records are a specialized form of information. Essentially, records are information produced consciously or as by-products of business activities or transactions and retained because of their value. Legal compliance requirements can change over time Subject to changing understand of business activities
  • KCL Archives & Information Management (AIM) operate as a central unit Institutional business records records created in the course of day-to-day business to the private papers of academics and ensure scrutiny of decision making by a wider audience and to demonstrate transparency and openness within publicly funded bodies. What do you keep and for how long? Increase paper content places strain upon physical storage space + increased storage costs Increasing staff time necessary to locate material to fulfil FOI requests Many, but possibly not all records, provide a specific function
  • New challenges, but also new opportunities New types of information are part of institutional business Management procedures are well defined for paper records, but undeveloped for hybrid or digital collections Information increasingly produced in digital form Increasing amount of content, but does it all have value in short and long-term?
  • a joint project with AIM and the fact you have a professional archivist and records manager on the team.
  • Introduce audit framework as a method of understanding To understand the types of information that are being produced, sought to examine data assets being produced Developed an assessment strategy based upon 6 stages Defines process for performing audit within college Consists of 6 stages Combines sections of Data Audit Framework (DAF), DRAMBORA, DIRKS & other audit work
  • Selected academic units that had data of particular value and/or identified as being at risk.
  • A second aspect of the audit work was to develop an understanding of the lifecycle of a digital object.
  • Applied DRAMBORA frameworks to identify and assess risks in data creation/management lifecycle
  • Storage - Potential for data loss in future
  • Project team identified several actions necessary to reduce likelihood of risks occuring: Policies: Review collection and curation policies for
  • The primary method that we intend to use to mitigate many of the risks is through the implementation of a
  • OAIS Reference model provides a standardised method of expressing the components of an digital repository – Ingest, Data Management, Archival storage, Administration, Preservation Planning and Access. These components process the digital object at each stage of the workflow, changing it from a Submission Information Package into an Archival Package suitable for preservation and a Dissemination Package suitable for access & use
  • Submission component: Alfresco CMS Community Edition provides submission interface and data processing functionality Archive component: Fedora Commons as back-end data archive Access component: Evaluating Muradora to provide web front-end for accessing archived content
  • Actions may be performed in sequence or in parallel jBPM may be used to
  • Majority of automated actions will be performed on deposit, once data has been submitted to archive
  • Authoritative Records ISO 15489 Information and Documentation -- Records Management, describes an authoritative record as being a record that has the characteristics of: authenticity reliability integrity usability As explained in ISO 15489, the aim of all records management systems should be to ensure that records stored within them are authoritative. Summarizing, an authoritative record: can be proven to be what it purports to be can be proven to have been created or sent by the person purported to have created or sent it can be proven to have been created or sent at the time purported can be depended on because its contents can be trusted as a full and accurate representation of the transactions, activities or facts to which it attests; is complete and unaltered can be located, retrieved, presented and interpreted Collection held in Alfresco until complete. It is only transferred into Fedora when one of two parameters are met Data files deleted from Alfresco
  • Pekin eca2010-v2

    1. 1. Who Decides? Reinterpreting archival processes for the management of digital research Gareth Knight Centre for e-Research, King’s College London
    2. 2. Presentation Themes <ul><li>Need to re-appraise definition & criteria for a Record </li></ul><ul><li>Challenges posed when attempting to archive digital records </li></ul><ul><li>Technical architecture and processes required to manage digital records </li></ul>
    3. 3. What is a record? <ul><li>“ information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business ” </li></ul><ul><li>ISO 15489-1:2001. Records management </li></ul><ul><li>“ a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity “ </li></ul><ul><li>The International Committee on Archives (ICA) Committee </li></ul>
    4. 4. What is a record? <ul><li>“ information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business ” </li></ul><ul><li>ISO 15489-1:2001. Records management </li></ul><ul><li>“ a specific piece of recorded information generated, collected or received in the initiation, conduct or completion of an activity and that comprises sufficient content, context and structure to provide proof or evidence of that activity “ </li></ul><ul><li>The International Committee on Archives (ICA) Committee </li></ul>
    5. 5. What is a Record today? What is a Record tomorrow? <ul><li>Criteria for defining records that have archival value is subject to range of factors and may change over time: </li></ul><ul><li>Legal obligation: </li></ul><ul><ul><li>Varies between institution of different type </li></ul></ul><ul><ul><li>Geographic location </li></ul></ul><ul><ul><li>E.g. Public Records Act 1958 (government), Data Protection Act, Freedom of Information (FoI) Act 2000, Environmental Information Regulations (2004), etc. </li></ul></ul><ul><li>Proof of business: </li></ul><ul><ul><li>What type of business? Commercial, non-commercial </li></ul></ul><ul><ul><li>Research as a business activity </li></ul></ul><ul><li>Proof of activity: </li></ul><ul><ul><li>May be broad. Influenced by function, evidential value, uniqueness </li></ul></ul><ul><li>Memory of institution </li></ul><ul><ul><li>Same as above – requires consideration of function, value, uniqueness and other criteria </li></ul></ul>
    6. 6. Re-evaluating Records management at King’s College London <ul><li>College Archives acquire, preserve and makes available all material of long-term, evidential and research interest that forms part of the College’s heritage </li></ul><ul><li>Acquisition policy: </li></ul><ul><ul><li>Paper and electronic records of day-to-day business operation, private papers of academics and researchers and material related to College heritage </li></ul></ul><ul><li>Preservation policy: </li></ul><ul><ul><li>Preserve archives in their original physical format (storage, packaging). </li></ul></ul><ul><ul><li>Maintain in appropriate environmental conditions – in c ompliance with BS5454: 2000 </li></ul></ul><ul><li> </li></ul>
    7. 7. New archiving challenges <ul><li>New types of information: </li></ul><ul><ul><li>Published research papers, research data – funder requirements for data management </li></ul></ul><ul><li>Creation of increasingly diverse content types: </li></ul><ul><ul><li>Hybrid (paper+digital), digital content </li></ul></ul><ul><ul><li>CAD designs, interactive resources, datasets. Publication of dynamic content </li></ul></ul><ul><li>Update frequency: </li></ul><ul><ul><li>Web site, blogs, twitter, object revisions. Versioning </li></ul></ul><ul><li>Access lifecycle: </li></ul><ul><ul><li>Technology dependencies – software & hardware </li></ul></ul><ul><li>Uncertain value: </li></ul><ul><ul><li>What is the business value? What period of time? </li></ul></ul>
    8. 8. Preservation Exemplars at King’s (PEKin) <ul><li>Project objectives: </li></ul><ul><ul><li>Develop a management system capable of handling digital business records AND research material </li></ul></ul><ul><ul><li>Adopt a management strategy that brings together archival AND data curation approaches </li></ul></ul><ul><ul><li>Embed preservation practices within the institution through a multi-layered strategy: </li></ul></ul><ul><ul><ul><li>work with central services to develop a preservation strategy and service for digital records </li></ul></ul></ul><ul><ul><ul><li>work with academic units and professional services to ensure local data producers and systems managers are provided with targeted advice, guidance and tools to support decision-making </li></ul></ul></ul><ul><li>Project Partners: </li></ul><ul><li>Centre for e-Research (CeRch) & Archives & Information Management (AIM) at King’s College London </li></ul><ul><li>Funder: </li></ul><ul><li>JISC Information Environment 09-11 preservation exemplars strand </li></ul>
    9. 9. Audit framework PEKin audit framework combines sections of DAF, DRAMBORA, DIRKS & other audit work
    10. 10. Audit management practices <ul><li>Purpose of audit was to identify: </li></ul><ul><ul><li>Functions within the organisation that create records </li></ul></ul><ul><ul><li>Who used records and for what purpose </li></ul></ul><ul><ul><li>Location and responsibility for storage </li></ul></ul><ul><ul><li>Time period they are currently / should be retained for </li></ul></ul><ul><ul><li>Future stakeholders that need/may wish to use records </li></ul></ul><ul><li>Survey academic and business units </li></ul><ul><li>Core business units: </li></ul><ul><ul><li>College Estates, Student Records, College Committees </li></ul></ul><ul><li>Research groups: </li></ul><ul><ul><li>Twins Early Development Study, Regional Information Collection Centre, Environment Research Group, Randall Division of Cell and Molecular Biophysics </li></ul></ul>
    11. 11. Types of digital information <ul><li>Business records: </li></ul><ul><ul><li>Estates records – Property records, contract for building maintenance, Computer Aided Design (CAD) </li></ul></ul><ul><ul><li>Student Records – Students, courses, grades </li></ul></ul><ul><ul><li>Committee Records – Structure, operation </li></ul></ul><ul><li>Research records: </li></ul><ul><ul><li>Commercial – research data created for commercial purposes, e.g. pollution monitoring, patents </li></ul></ul><ul><ul><li>Funded research – Contracts awarded by funding bodies </li></ul></ul><ul><ul><li>Unfunded research – Academic researcher who has interest in topic </li></ul></ul><ul><li>Each record has different value and retention period. </li></ul>
    12. 12. Many types of lifecycle Record lifecycle (variants: Information, data lifecycle) Access lifecycle (e.g. digital lifecycle)
    13. 13. Analysis of lifecycle risks <ul><li>Identify & evaluate risks that occur in the lifecycle </li></ul><ul><li>Applied a ‘light touch’ DRAMBORA ( methodology to case studies. Influenced by DIRKS </li></ul><ul><li>Risk categories </li></ul><ul><ul><li>Organisation Management, Staff, Tech Infrastructure, Acquisition & Ingest, Preservation & Storage, Access & Dissemination </li></ul></ul><ul><li>Risk Description </li></ul><ul><ul><li>Definition, manifestations, consequences, severity (risk impact x probability), mitigation strategies </li></ul></ul>
    14. 14. Recognised risks <ul><li>Storage: </li></ul><ul><ul><li>Insufficient capacity: local drives, network drives, 3rd party server </li></ul></ul><ul><li>Authenticity & integrity: </li></ul><ul><ul><li>Unidentified/unknown change. Some staff rely upon print-outs of digital original </li></ul></ul><ul><li>Archival value and retention period: </li></ul><ul><ul><li>Different criteria & quality thresholds </li></ul></ul><ul><ul><li>Business records – Recognised legal value & retention period </li></ul></ul><ul><ul><li>Research data – Archival value of research papers understood. Retention period of data has, until recently, not been recognised </li></ul></ul><ul><li>Access and usage: </li></ul><ul><ul><li>Business records have well-defined period of primary use, but unrecognised secondary use </li></ul></ul><ul><ul><li>Research papers understood, but do not always consider datasets & other outputs </li></ul></ul>
    15. 15. Risk management Strategy <ul><li>Storage and management infrastructure: </li></ul><ul><ul><li>Technical infrastructure to store and manage their data </li></ul></ul><ul><li>Education: </li></ul><ul><ul><li>Develop staff understanding of data management and archival principles </li></ul></ul><ul><ul><ul><li>Topics include: Authenticity and integrity, assessment of archival value </li></ul></ul></ul><ul><ul><ul><li>Methods: Practical documentation on data creation/management, training events </li></ul></ul></ul><ul><li>Policies: </li></ul><ul><ul><li>type of record collected, time period for collection, appraisal criteria for long-term retention </li></ul></ul><ul><li>Procedures: </li></ul><ul><ul><li>Data capture, curation, preservation </li></ul></ul><ul><li>Developed with consideration of cost implications </li></ul>
    16. 16. KCL Archives Preservation Repository <ul><li>An preservation repository for college data of short/long-term value: </li></ul><ul><li>Standards compliance </li></ul><ul><ul><li>OAIS Reference Model, TRAC, ISO 15489 </li></ul></ul><ul><li>Bitstream preservation: </li></ul><ul><ul><li>fixity creation/verification, online + offline storage </li></ul></ul><ul><li>Information Content Preservation: </li></ul><ul><ul><li>Format conversion, event logging – audit trail </li></ul></ul><ul><li>Access: </li></ul><ul><ul><li>Limited to archive reading room, catalogue descriptive MD to common standard </li></ul></ul><ul><li>Interoperable: </li></ul><ul><ul><li>Interact with other college & public systems eg student records </li></ul></ul>
    17. 17. OAIS Reference Model
    18. 18. Technical Infrastructure
    19. 19. Alfresco Actions <ul><li>Actions - a parameterized unit of work that can be applied to a node </li></ul><ul><li>Parameters – rules for action execution and type to be performed </li></ul><ul><li>jBPM synchronous or asynchronous workflows </li></ul><ul><li>Actions be performed at different stages of workflow </li></ul>
    20. 20. Ingest Actions <ul><li>Content model compliance </li></ul><ul><ul><li>Conforms to defined structure & object types </li></ul></ul><ul><li>Fixity generation </li></ul><ul><ul><li>All: MD5, SHA-1, CRC </li></ul></ul><ul><li>Format identification </li></ul><ul><ul><li>All: File(1), DROID </li></ul></ul><ul><li>Technical metadata extraction </li></ul><ul><ul><li>Format specific: JHOVE, MP3Info, others </li></ul></ul><ul><li>Specification conformance </li></ul><ul><ul><li>branch workflow according to threshold </li></ul></ul><ul><li>Conversion to preservation & dissemination derivative </li></ul><ul><ul><li>parameters for each format & licence </li></ul></ul><ul><ul><li>OpenOffice, ImageMagick, SoX </li></ul></ul><ul><li>Data Packaging </li></ul><ul><ul><li>Generate METS package, record action results as PREMIS Event </li></ul></ul>
    21. 21. Archiving actions <ul><li>Transfer into Fedora archive </li></ul><ul><ul><li>when collection closed (e.g. all papers submitted for meeting collection </li></ul></ul><ul><ul><li>After specified time period, e.g. 3 months </li></ul></ul><ul><li>Fixity verification: </li></ul><ul><ul><li>Conform that fixity unchanged </li></ul></ul><ul><li>Manual activity for future date: </li></ul><ul><ul><li>Anonymisation in 2 years, </li></ul></ul><ul><ul><li>Re-appraisal in xx years – Retain or remove </li></ul></ul><ul><li>Obsolescence monitoring? </li></ul><ul><ul><li>Possible future implementation </li></ul></ul>
    22. 22. Content Models <ul><li>Content models define rules that govern collection structure, data type & behaviour </li></ul><ul><li>Preservation Archive Content Models designed with consideration of resource type and Alfresco & Fedora capabilities </li></ul><ul><li>Content model for each resource type composed of 3 layers </li></ul><ul><li>Each layer is a Fedora Object that contains different metadata </li></ul>Filing Cabinet Drawer Folder
    23. 23. Content Model examples Each item represents a Fedora Object Each FO holds user provided metadata Different MD required at each layer Filing Cabinet Drawer Folder
    24. 24. Findings (so far) <ul><li>Data audit & risk analysis provide useful frameworks for analysing data management practices & justifying data archiving </li></ul><ul><li>No single definition of archival value – different criteria & quality thresholds </li></ul><ul><li>Application of archival principles to data assets provide demonstrable benefits </li></ul><ul><li>Duration requirements of business & research data are broadly similar </li></ul><ul><li>Digital repository architecture provide sufficient flexibility to manage data assets created for different purposes </li></ul>
    25. 25. Contact <ul><li>Gareth Knight </li></ul><ul><li>Centre for e-Research, King’s College London </li></ul><ul><li>[email_address] </li></ul><ul><li>020 7848 1979 </li></ul> Centre for e-Research : Archives and Information Management (AIM) :