Long-Time Preservation

Thomas Stensitzki
Agenda



  1        Long-Term preservation

  2        Why should/must items be archived?

  3        What should/must items be archived?

  4        How can archiving be done?




Page  2
Terms
Outsourcing, Filing, Backup, Archiving




 Outsourcing
   - Data (e.g. of a specific period) is being exported from a source system and
     converted (if required)
   - Outsourced data is not available in the source system
   - Outsourced data can be backed up or archived
   - Importing of outsourced data might require conversion, when the target data
     structure is different

 Filing
   - Storage of objects in a folder of the file system
   - Filed objects can be backed up or archived depended on their file location




Page  3
Terms
Outsourcing, Filing, Backup, Archiving




 Backup
   - Copy of existing objects to a storage medium to be able to restore data in the
     case of data corruption or accidental deletion
   - Performed periodically
   - Storage medium is being overwritten in time, older version of an object can
     therefore not be restored
   - Old versions of an object can be restored for a specific period only

 Archiving
   - Copy of a file or document to an external storage medium
   - Standardized file format (tif, jpg) (if required)
   - Storage for a longer period




Page  4
Terms
Document management vs. Long-term preservation




 Document management
   - Management of documents being edited using Check-In, Check-Out and
     Versioning
   - Documents can be found by attribute value search or full-text search
   - Attributes and document links are managed by DMS
   - Documents are stored in the file system or a DMS database




Page  5
Terms
Document management vs. Long-term preservation




 Long-term preservation
   - Auditable and unchangeable storage of completed objects for a long time
   - Copy of objects (e.g. files, documents) to an external storage medium
   - Files and raw data are archived in original format
   - Documents are converted and archived in standardized format (black/white =
     TIF, colour = JPEG or PDF/A)
   - Document lookup via index
   - Archived files and raw data can be provided in original format
   - Archived documents can be provided using a viewer software




Page  6
Terms
Long-term preservation




 Digital archiving
   - Database-driven, long-term, secure and unchangeable storage of digital
     information objects which are reproducible at any time

 Digital long-term preservation
   - Storage of digital information for a period longer than 10 years

 Auditable digital archiving
   - Storage of digital business-related information of in accordance to the
     requirements of
   - Handelsgesetzbuch § 239, § 257 HGB
   - Abgabeordnung § 146, §147, § 200 AO
   - GoBS
   - Secure and orderly storage of business-related documents with retention
     periods of six to ten years
Page  7
Why
Sources of documents/objects




 Documents, lifecycle of documents
   - Creation and editing documents: in process (e.g. DMS, SharePoint)
   - Completed documents: final version of a document
   - Additional editing creates new version

 Other documents
   - Correspondence, reports, rules, pictures, films, letters, invoices, quotations,
     certificates from different sources

 Workflows
   - Information from workflow based systems (with digital signatures)
   - Final document can be created from related data as the final workflow step

 IT systems
   - Raw data is usually available in databases or files

Page  8
Why
Dealing with documents/objects




 Documents
   - Documents in process and/or final documents are stored in DMS, SharePoint or
     a disk drive (local or network share)
   - Documents stored on network shares are backup automatically
   - Documents in SharePoint and emails in Outlook are deleted after retention
     period has expired
   - Deleted documents on a network share cannot be restored after the backup
     period as exceeded
   - Final documents signed by hand are archived in paper and/or scanned to PDF
     and stored as file (attached to an email)




Page  9
Why
Dealing with documents/objects




 Other documents
   - Emails are deleted from the inbox automatically after retention period has
     expired
   - Reports, images, films, invoices, quotations, certificates, etc. available as files
     are be considered as documents
   - Documents in paper, e.g. correspondence, letters, certificates, etc. are stored in
     files




Page  10
Why
Dealing with documents/objects




 Workflow vs. documents
   - Information created in workflow systems is stored with data of digital signatures
     in databases
   - All data of a finalized workflow is stored digitally within the database (usually),
     final document can be created using a template
   - Print-out is treated as a copy of the original digital document
   - Digitally signed documents are treated equally to documents signed by hand

 IT systems vs. raw data
   - Raw data is stored in databases or files which grow over time
   - Data can be outsourced or exported to reduce the storage size, but the data is
     not instantly accessible for the application
   - Software manufacturers must guarantee that release changes do not impact the
     capability to import outsourced data

Page  11
Why
Legal and regulatory requirements for archiving




 Legal requirements for business documents
   - Handelsgesetzbuch (HGB) § 257 regulates which business documents have
     to be archived
   - Legal retention period for business letters is 6 years, for other documents 10
     years
   - Abgabenordnung (AO) §§ 146, 147 describe similar requirements for
     administrative regulations
   - Digitally archiving of those documents must comply to the principles of proper
     accounting (GoB) and GoBS which describe the requirements for process
     documentation
   - Process documentation is the proof of correct operation of the system and
     describes the overall organizational and technical process of archiving
     (collection, indexing, storage, retrieval, protection against loss / corruption and
     reproduction of archived information)


Page  12
Why
Legal and regulatory requirements for archiving




   - Digitally signed documents are legally binding as well as conventional paper
     documents
   - Each country has different requirements depending on the business of the
     company (e.g. Sarbanes-Oxley Act regarding internal controlling)
   - Subject to audits and inspections




Page  13
Why
Legal and regulatory requirements for archiving




 Industry-specific requirements for documentation / archiving
   - Gefahrengutverordnung (GGAV)
   - Environmental liability and product liability law
   - Operational directives and regulations
   - Good Practice quality guidelines and regulations
   - etc.


   Agree with internal departments (QS, Legal, Controlling) and maybe with
   authorities on the archiving process




Page  14
What
Retention policies for information life-cycle in Outlook and SharePoint




 Recommendations
    Outlook          Retention period      Classes in SharePoint   Retention period

    Inbox                 60 days          Standard                       2 years

    Other folders         2 years          Review                         7 years
    Sent Items
    Drafts                                 Long-Term                      10 years
    Outbox
    Deleted items         7 days

    Calendar              2 years
    Tasks
    Contacts            Duration of
                        employment




Page  15
What
Which documents and data




 Business units determine
   - Which documents have to be archived how and for how long
     (storage form, file plan, retention periods)
   - Document classes (logical archive)
   - Document types
   - Index data




Page  16
What
Requirements




 Requirements for long-term preservation are specified by the
  business
   - Processes, workflows, interfaces
   - Documents, objects, source, meta data
   - Archiving period
   - Regulatory aspects
   - Permissions, roles, user management, responsibilities
   - Purpose of archiving (e.g. display of documents in 15 years)
   - Confidentiality, data integrity, sensitive data, availability
   - Capacity (data volume, number of users, performance)
   - etc.




Page  17
What
Meta data




 Meta data provides structured index and search capabilities to
  archived objects
   - Source of meta data (e.g. master data systems)
   - Who maintains the master data?
   - Shall meta data be selected or manually entered?
   - Is meta data document-dependent?
   - Is meta data transferred automatically from other systems?
   - Is an audit-trail required? (Who has changed which meta-data, when, why)


   Coordination of the meta data in early stages is highly recommended




Page  18
What
Requirements




 If raw data has to be archived
   - Raw data is stored as is, bit-wise
   - Primary goal is the ability to import raw data as 1:1 copy of the original data
   - IT system generating raw data must be able to handle imported raw data even
     after a long time
   - Format of raw data must be coordinated
   - Software manufacturers must guarantee that release changes do not impact the
     capability to import outsourced data
   - Meta data must be defined
   - Processing of long-term preserved raw data is the responsibility of the
     generating IT system, not of the archiving system




Page  19
How
Technical aspects




 Selection of eligible file formats
   - Should the document be displayed as original incl. embedded graphics?
   - Should reproduce the original document properties (paper size, font size,
     header, footer, logos, color, hand-written notes, etc.)?
   - Should documents be archived in different formats but with same content (e.g.
     XML and graphic)?
   - Legal requirements?
   - Is “loss of information” acceptable when converting into graphical
     representations (jpeg)?
   - Is the converting process revision-safe?
   - Is the archived document format suitable for the archiving period?




Page  20
How
BSI approved formats




 Graphics
   - TIFF, storage of screened black-white images
   - JPEG, storage of colour and gray scale images


 Structure formats
   - XML, can be used for long-term preservation of digital documents
     Schema and layout have to be archived as well
   - PDF/A, subset of PDF, standardized for long-term preservation
     Format with structure and layout information and graphical objects
     Documents must be validated to be PDF/A compliant




Page  21
How
Storage media




 Possible storage media
   - Paper
   - Microfilm
   - Magnetic tapes, floppy disks
   - Optical storage media (e.g. CD-R, CD-ROM, DVD, WORM)
   - Hard drives
   - etc.



Selected media types have a limited lifetime and durability. Long-term
preserved objects must be copied to new media unchanged, if
required due to technology related changes in the storage media.


Page  22
How
Additional topics




   - Storage of sensitive data
   - Restart of the archiving system after system outage in a disaster
   - Integration in current IT environment
   - Migration of archived objects is expensive depending on data volume
   - User management
   - Usage of storage media must be regulated
   - Firewall based separation of archiving system
   - Long-Term archiving solution should be in use for a long time, supplier selection
     should be aware of this




Page  23
How
Pros & Cons




                  Pros                                    Cons
   Single storage of documents/objects    Usage of source documents must be
   Save storage space                      regulated

   Documents/objects available to         Personal must be trained
    authorized persons                      (end-user, administrator)

   Documents/objects available from       On-going maintenance costs
    every workplace                        Complex IT system and IT
   Structured search of                    infrastructure required
    documents/objects




Page  24
Do You Have
            Any Questions?
            We would be happy to help.
            http://www.sf-tools.net
            Info@sf-tools.net




Page  25

Long time preservation

  • 1.
  • 2.
    Agenda 1 Long-Term preservation 2 Why should/must items be archived? 3 What should/must items be archived? 4 How can archiving be done? Page  2
  • 3.
    Terms Outsourcing, Filing, Backup,Archiving  Outsourcing - Data (e.g. of a specific period) is being exported from a source system and converted (if required) - Outsourced data is not available in the source system - Outsourced data can be backed up or archived - Importing of outsourced data might require conversion, when the target data structure is different  Filing - Storage of objects in a folder of the file system - Filed objects can be backed up or archived depended on their file location Page  3
  • 4.
    Terms Outsourcing, Filing, Backup,Archiving  Backup - Copy of existing objects to a storage medium to be able to restore data in the case of data corruption or accidental deletion - Performed periodically - Storage medium is being overwritten in time, older version of an object can therefore not be restored - Old versions of an object can be restored for a specific period only  Archiving - Copy of a file or document to an external storage medium - Standardized file format (tif, jpg) (if required) - Storage for a longer period Page  4
  • 5.
    Terms Document management vs.Long-term preservation  Document management - Management of documents being edited using Check-In, Check-Out and Versioning - Documents can be found by attribute value search or full-text search - Attributes and document links are managed by DMS - Documents are stored in the file system or a DMS database Page  5
  • 6.
    Terms Document management vs.Long-term preservation  Long-term preservation - Auditable and unchangeable storage of completed objects for a long time - Copy of objects (e.g. files, documents) to an external storage medium - Files and raw data are archived in original format - Documents are converted and archived in standardized format (black/white = TIF, colour = JPEG or PDF/A) - Document lookup via index - Archived files and raw data can be provided in original format - Archived documents can be provided using a viewer software Page  6
  • 7.
    Terms Long-term preservation  Digitalarchiving - Database-driven, long-term, secure and unchangeable storage of digital information objects which are reproducible at any time  Digital long-term preservation - Storage of digital information for a period longer than 10 years  Auditable digital archiving - Storage of digital business-related information of in accordance to the requirements of - Handelsgesetzbuch § 239, § 257 HGB - Abgabeordnung § 146, §147, § 200 AO - GoBS - Secure and orderly storage of business-related documents with retention periods of six to ten years Page  7
  • 8.
    Why Sources of documents/objects Documents, lifecycle of documents - Creation and editing documents: in process (e.g. DMS, SharePoint) - Completed documents: final version of a document - Additional editing creates new version  Other documents - Correspondence, reports, rules, pictures, films, letters, invoices, quotations, certificates from different sources  Workflows - Information from workflow based systems (with digital signatures) - Final document can be created from related data as the final workflow step  IT systems - Raw data is usually available in databases or files Page  8
  • 9.
    Why Dealing with documents/objects Documents - Documents in process and/or final documents are stored in DMS, SharePoint or a disk drive (local or network share) - Documents stored on network shares are backup automatically - Documents in SharePoint and emails in Outlook are deleted after retention period has expired - Deleted documents on a network share cannot be restored after the backup period as exceeded - Final documents signed by hand are archived in paper and/or scanned to PDF and stored as file (attached to an email) Page  9
  • 10.
    Why Dealing with documents/objects Other documents - Emails are deleted from the inbox automatically after retention period has expired - Reports, images, films, invoices, quotations, certificates, etc. available as files are be considered as documents - Documents in paper, e.g. correspondence, letters, certificates, etc. are stored in files Page  10
  • 11.
    Why Dealing with documents/objects Workflow vs. documents - Information created in workflow systems is stored with data of digital signatures in databases - All data of a finalized workflow is stored digitally within the database (usually), final document can be created using a template - Print-out is treated as a copy of the original digital document - Digitally signed documents are treated equally to documents signed by hand  IT systems vs. raw data - Raw data is stored in databases or files which grow over time - Data can be outsourced or exported to reduce the storage size, but the data is not instantly accessible for the application - Software manufacturers must guarantee that release changes do not impact the capability to import outsourced data Page  11
  • 12.
    Why Legal and regulatoryrequirements for archiving  Legal requirements for business documents - Handelsgesetzbuch (HGB) § 257 regulates which business documents have to be archived - Legal retention period for business letters is 6 years, for other documents 10 years - Abgabenordnung (AO) §§ 146, 147 describe similar requirements for administrative regulations - Digitally archiving of those documents must comply to the principles of proper accounting (GoB) and GoBS which describe the requirements for process documentation - Process documentation is the proof of correct operation of the system and describes the overall organizational and technical process of archiving (collection, indexing, storage, retrieval, protection against loss / corruption and reproduction of archived information) Page  12
  • 13.
    Why Legal and regulatoryrequirements for archiving - Digitally signed documents are legally binding as well as conventional paper documents - Each country has different requirements depending on the business of the company (e.g. Sarbanes-Oxley Act regarding internal controlling) - Subject to audits and inspections Page  13
  • 14.
    Why Legal and regulatoryrequirements for archiving  Industry-specific requirements for documentation / archiving - Gefahrengutverordnung (GGAV) - Environmental liability and product liability law - Operational directives and regulations - Good Practice quality guidelines and regulations - etc. Agree with internal departments (QS, Legal, Controlling) and maybe with authorities on the archiving process Page  14
  • 15.
    What Retention policies forinformation life-cycle in Outlook and SharePoint  Recommendations Outlook Retention period Classes in SharePoint Retention period Inbox 60 days Standard 2 years Other folders 2 years Review 7 years Sent Items Drafts Long-Term 10 years Outbox Deleted items 7 days Calendar 2 years Tasks Contacts Duration of employment Page  15
  • 16.
    What Which documents anddata  Business units determine - Which documents have to be archived how and for how long (storage form, file plan, retention periods) - Document classes (logical archive) - Document types - Index data Page  16
  • 17.
    What Requirements  Requirements forlong-term preservation are specified by the business - Processes, workflows, interfaces - Documents, objects, source, meta data - Archiving period - Regulatory aspects - Permissions, roles, user management, responsibilities - Purpose of archiving (e.g. display of documents in 15 years) - Confidentiality, data integrity, sensitive data, availability - Capacity (data volume, number of users, performance) - etc. Page  17
  • 18.
    What Meta data  Metadata provides structured index and search capabilities to archived objects - Source of meta data (e.g. master data systems) - Who maintains the master data? - Shall meta data be selected or manually entered? - Is meta data document-dependent? - Is meta data transferred automatically from other systems? - Is an audit-trail required? (Who has changed which meta-data, when, why) Coordination of the meta data in early stages is highly recommended Page  18
  • 19.
    What Requirements  If rawdata has to be archived - Raw data is stored as is, bit-wise - Primary goal is the ability to import raw data as 1:1 copy of the original data - IT system generating raw data must be able to handle imported raw data even after a long time - Format of raw data must be coordinated - Software manufacturers must guarantee that release changes do not impact the capability to import outsourced data - Meta data must be defined - Processing of long-term preserved raw data is the responsibility of the generating IT system, not of the archiving system Page  19
  • 20.
    How Technical aspects  Selectionof eligible file formats - Should the document be displayed as original incl. embedded graphics? - Should reproduce the original document properties (paper size, font size, header, footer, logos, color, hand-written notes, etc.)? - Should documents be archived in different formats but with same content (e.g. XML and graphic)? - Legal requirements? - Is “loss of information” acceptable when converting into graphical representations (jpeg)? - Is the converting process revision-safe? - Is the archived document format suitable for the archiving period? Page  20
  • 21.
    How BSI approved formats Graphics - TIFF, storage of screened black-white images - JPEG, storage of colour and gray scale images  Structure formats - XML, can be used for long-term preservation of digital documents Schema and layout have to be archived as well - PDF/A, subset of PDF, standardized for long-term preservation Format with structure and layout information and graphical objects Documents must be validated to be PDF/A compliant Page  21
  • 22.
    How Storage media  Possiblestorage media - Paper - Microfilm - Magnetic tapes, floppy disks - Optical storage media (e.g. CD-R, CD-ROM, DVD, WORM) - Hard drives - etc. Selected media types have a limited lifetime and durability. Long-term preserved objects must be copied to new media unchanged, if required due to technology related changes in the storage media. Page  22
  • 23.
    How Additional topics - Storage of sensitive data - Restart of the archiving system after system outage in a disaster - Integration in current IT environment - Migration of archived objects is expensive depending on data volume - User management - Usage of storage media must be regulated - Firewall based separation of archiving system - Long-Term archiving solution should be in use for a long time, supplier selection should be aware of this Page  23
  • 24.
    How Pros & Cons Pros Cons  Single storage of documents/objects  Usage of source documents must be  Save storage space regulated  Documents/objects available to  Personal must be trained authorized persons (end-user, administrator)  Documents/objects available from  On-going maintenance costs every workplace  Complex IT system and IT  Structured search of infrastructure required documents/objects Page  24
  • 25.
    Do You Have Any Questions? We would be happy to help. http://www.sf-tools.net Info@sf-tools.net Page  25