PDF/A – A standard for
document archiving
Dipl. Inf. Reinhold Müller-Meernach
Röttenbach
                                                     006
                                                  2/2
Dr. Uwe Wächter                               No.

Roßdorf




                           SEAL Systems
                       info@sealsystems.com
                       www.sealsystems.com
DOCUMENT MANAGEMENT




PDF/A – A standard for
document archiving
Dipl. Inf. Reinhold Müller-Meernach
Röttenbach

Dr. Uwe Wächter
Roßdorf




             The »leader of the pack« TIFF/G4 has got competition.
             With PDF/A, a new standard for long-term archiving of electronic
             documents has now been defined. Checks on existing document
             archives show that a large amount of the PDF files archived there
             don’t even meet the minimum requirements of the new standard.
             But this is no longer a reason to panic.
DOCUMENT MANAGEMENT

   Paper archives have been and
are being replaced by digital stora-
ge. The number of electronically
created documents is growing con-
stantly. For long-term archiving of
these documents, standards are
beneficial if the well-defined repro-
ducibility and distribution is to be                                                   Fig. 1: Investigations show that
supported over a long period of                                                        almost no PDF files in existing
time. The monochrome grid format                                                       archives conform to PDF/A.
                                                                                       (Fig.: Seal Systems AG,
TIFF/G4 has been the de facto                                                          Röttenbach)
standard for more than ten years.
For text-laden documents (such as       References to external sources,       of this are the effects of trans-
those from Office applications), the    such as further files, images, web-   parency, colour mixing and back-
Portable Document Format, PDF           sites or external fonts contradict    ground stamping. These characte-
for short, from Adobe has become        the PDF/A norm.                       ristics can not be represented 1:1
established as an application-neu-                                            with many PDF generating proces-
                                           An especially important charac-
tral exchange format. With PDF/A,                                             ses. Therefore, with PDF/A, this
                                        teristic of PDF/A is the embedding
there is now a standard, that                                                 must be avoided.
                                        of fonts. Only this can ensure that
establishes a part of the PDF speci-
                                        a document can be printed in
fication to make PDF files parti-
                                        exactly the same way after many            Secure archiving in line
cularly suitable for archiving.
                                        years, without having to use font              with the norm
  The ISO Norm 19005-1 is based         definitions on a computer or prin-
on the »PDF Reference 1.4« from         ter. PDF can also demonstrate its       Secure archiving in line with the
Adobe. It makes PDF 1.4 more            advantage over TIFF G4 through        norm means that the saved files
precise and defines whether its         its colour displays. However, this    can then still be used if the admi-
properties are obligatory, re-          only conforms to standards if the     nistration system corrupts. There-

                                                                                          Fig. 2: With test and correction
                                                                                          procedures for PDF/A, data
                                                                                          stocks can be viewed and modified
                                                                                          as the case may need.
                                                                                          (Fig.: Seal Systems AG, Röttenbach)




commended, limited or prohibited.       PDF file can also be printed in-      fore, PDF/A-conforming files must
This makes it possible to differenti-   stantly on all colour printers.       operate a clause on metadata.
ate two levels of PDF/A: a (PDF/A-      To do this, colour definitions un-
                                                                                The Portable Document Format
1a) and b (PDF/A-1b).                   related to the equipment are saved
                                                                              makes it possible to save graphic
                                        in the file, which are only conver-
                                                                              displays in different representations
       Level B is important             ted when printing.
                                                                              simultaneously. This means an
          for archiving                   Simple and safe reproduction can    improved display on different
                                        be prevented through protective       screens (PC or Handheld or PDA)
   Level B deals mainly with the        mechanism, compressions and           or a user orientation (German or
preservation of the external appea-     encodings. Therefore, these tech-     English) is possible. However, as
rance over long periods of time.        niques are also prohibited for        reproduction is unclear with this
To do this, it is necessary that all
                                        PDF/A conforming files.               method, this function contradicts
the information needed for the
                                                                              ISO 19005-1.
reproduction is contained in the           Frequently, image overlapping in
file itself. For example, this con-     certain applications can be              When using level A, such charac-
cerns all texts, graphics, images,      specifically used to elicit certain   teristics are additionally standardised
fonts and colour information.           effects for the observer. Examples    using level B, which define the
DOCUMENT MANAGEMENT

                                                                                              When this question is answered
                                                                                           then the next steps can be taken.
                                                                                           It must be clarified which proce-
                                                                                           dures guarantee that these
                                                                                           minimum requirements are com-
                                                                                           plied with. In addition, it must be
                                                                                           decided how to proceed with any
                                                                                           old stock. And finally, it must be
                                                                                           specified who is responsible for
                                                                                           inspection and compliance of these
                                                        Fig. 3: Test logs provide users    processes.
                                                        and IT managers with information
                                                        about the quality of the data         In the meantime, there are now
                                                        stock. (Fig.: Seal Systems AG,     countless software tools for
                                                        Röttenbach)
                                                                                           creating PDF files. The most well-
                                                                                           known is Acrobat from Adobe.
                                                                                           As well as many converting
                                                                                           applications from third-party
                                                                                           suppliers, there are a number of
                                                                                           applications that make it possible
                                                                                           to directly export PDFs. In the
properties for content, structure             operational practice. Therefore, it
                                                                                           future, this should also be possible
and semantics. This means there is            must be checked whether company
                                                                                           for the Office products from
the opportunity to be able to                 standards can also be defined
                                                                                           Microsoft. However, investigations
re-extract parts or information               taking into account practicability
                                                                                           show that some PDFs created in
from the PDF documents at a later             and compatibility with existing
                                                                                           this way do not even meet the
point in time. Furthermore, this              procedures. This takes over defini-
                                                                                           standard specification, so definitely
level explains how a Unicode font             tions from the ISO norm, combi-
                                                                                           fall short of the stricter ISO 19005-1.
must be dealt with. Work is                   ning comprehensible instructions
already being carried out on the              for action which can be used by all             In a very small number of cases,
expansion of this norm, which is              company members.                             PDF files are created solely within
named 19005-2 and is based on                                                              the company with an inspected
»Adobe PDF Reference 1.6«.                        Define minimum standards                 tool. PDF is an exchange format –
                                                                                           meaning the probability that con-
                                                 The past has shown that even in-          siderable data stocks stem from
      PDF/A level A covers                                                                 other, uncheckable sources is high.
                                              dividual industries can agree on a
       the complete norm
                                              standardised comprehension and                 Business partners, the internet
  Every international norm is a               procedure.                                   and emails are examples of this.
compromise between the interest                  If a company decides on PDF               For these reasons, it makes more
groups concerned and their                    as a reliable document format for            sense for the standard to be
requirements, which can be                    long-term archiving, then this is            inspected by the responsible body
contradicting in parts. Existing pro-         the next logical question: is every          within the archiving organisation.
cedures and local regulations                 PDF allowed or must it satisfy                  Nowadays, there are test
should be taken into account. On              certain minimum requirements?                programs, with which PDF files
the other hand, new technical                 When answering this question                 can be inspected for configurable
possibilities also shouldn’t be ruled         and defining the minimum                     ISO and company standard compa-
out. Maximum specification of all             standards, the ISO norm for                  tibility. The result of an inspection
details can lead to unusability in             PDF/A can help.                             is always a confirmation of




                   Fig 4: PDF/A inspections can be
                    integrated into existing Document
                   Management Systems (DMS) and
                Product Data Management Systems.
                (Fig.: Seal Systems AG, Röttenbach)
DOCUMENT MANAGEMENT

conformity or a rejection. In the        steps can be derived from this. A          becoming more powerful and
latter case, a qualified analysis        part of the data can be corrected,         extensive with every new version.
should take place so that the            another part not.                          3D visualisations, form processing,
creator can be given targeted                                                       digital signatures, change mana-
instructions for use.                                                               gement and pre-print inspection
                                          PDF/A – an archiving format
                                                                                    are only parts of the PDF applica-
   However, an alternative to                    with a future                      tion spectrum. The use as an
rejection can also be the automated
                                                                                    extensively simple exchange format
correction of a PDF file to norm           If the sources are known, it can
                                                                                    suggests itself for use as an archi-
conformity. Frequently observed          be possible to make a new norm-
                                                                                    ving format. The technical
incompatibilities, such as missing       conforming version available. First        requirements here are less but the
font embedding, can be corrected         experiences in reference supplies          legal ones are higher.
as a result with the minimum of          from industrial customers have
                                                                                      With PDF/A, a norm has now
                                                              Fig 5: The diagram    been passed, with which risks and
                                                              shows the integra-    future expense for long-term
                                                              tion of the PDF/A
                                                              methods into the      archiving can be minimised.
                                                              SAP document          There are tools to generate, inspect
                                                              management system.    and adjust PDF/A files. As a result,
                                                              (Fig.: Seal Systems
                                                                                    the new standard will rapidly be
                                                              AG, Röttenbach)
                                                                                    established as practical alternative.




effort. To safeguard processes, the      shown that almost no PDF file met
question of the time of a con-           the PDF/A-1b definition.
formity inspection is decisive.
                                            The most frequent errors are
   The first and best time is defini-    (in this order) missing metadata,
tely the generation process. For         no font embedding, colour
unknown documents or non-                management and protection me-
secured generation processes, a          chanisms. However, any weaknes-
simple checking procedure is supp-       ses can be automatically corrected
lied on the desktop. Both methods        through suitable tools. The
make it easier for the parties           Portable Document Format is
concerned to carry out an
inspection but do not force
them to do so.
  Therefore, it is recommen-
ded that the manufacturer
and operator of Document
Management Systems (DMS),
*Enterprise Content Management
Systems (ECM) and archiving
solutions provide a suitable inter-
face, through which test methods
can be integrated.
  If this interface is then run by all
archiving and converting processes,
                                                                                                         Fig. 6: The data
the PDF/A inspection is obligatory.                                                                      format PDF/A is
Even for existing PDF archives, a                                                                        classed as a norm,
one-off or regular inspection is                                                                         with which both the
recommended. A first run provides                                                                        risks and predomi-
                                                                                                         nantly the expense of
information about the quality of                                                                         long-term archiving
the data stock. Then subsequent                                                                          can be minimised.

a standard for document archiving

  • 1.
    PDF/A – Astandard for document archiving Dipl. Inf. Reinhold Müller-Meernach Röttenbach 006 2/2 Dr. Uwe Wächter No. Roßdorf SEAL Systems info@sealsystems.com www.sealsystems.com
  • 2.
    DOCUMENT MANAGEMENT PDF/A –A standard for document archiving Dipl. Inf. Reinhold Müller-Meernach Röttenbach Dr. Uwe Wächter Roßdorf The »leader of the pack« TIFF/G4 has got competition. With PDF/A, a new standard for long-term archiving of electronic documents has now been defined. Checks on existing document archives show that a large amount of the PDF files archived there don’t even meet the minimum requirements of the new standard. But this is no longer a reason to panic.
  • 3.
    DOCUMENT MANAGEMENT Paper archives have been and are being replaced by digital stora- ge. The number of electronically created documents is growing con- stantly. For long-term archiving of these documents, standards are beneficial if the well-defined repro- ducibility and distribution is to be Fig. 1: Investigations show that supported over a long period of almost no PDF files in existing time. The monochrome grid format archives conform to PDF/A. (Fig.: Seal Systems AG, TIFF/G4 has been the de facto Röttenbach) standard for more than ten years. For text-laden documents (such as References to external sources, of this are the effects of trans- those from Office applications), the such as further files, images, web- parency, colour mixing and back- Portable Document Format, PDF sites or external fonts contradict ground stamping. These characte- for short, from Adobe has become the PDF/A norm. ristics can not be represented 1:1 established as an application-neu- with many PDF generating proces- An especially important charac- tral exchange format. With PDF/A, ses. Therefore, with PDF/A, this teristic of PDF/A is the embedding there is now a standard, that must be avoided. of fonts. Only this can ensure that establishes a part of the PDF speci- a document can be printed in fication to make PDF files parti- exactly the same way after many Secure archiving in line cularly suitable for archiving. years, without having to use font with the norm The ISO Norm 19005-1 is based definitions on a computer or prin- on the »PDF Reference 1.4« from ter. PDF can also demonstrate its Secure archiving in line with the Adobe. It makes PDF 1.4 more advantage over TIFF G4 through norm means that the saved files precise and defines whether its its colour displays. However, this can then still be used if the admi- properties are obligatory, re- only conforms to standards if the nistration system corrupts. There- Fig. 2: With test and correction procedures for PDF/A, data stocks can be viewed and modified as the case may need. (Fig.: Seal Systems AG, Röttenbach) commended, limited or prohibited. PDF file can also be printed in- fore, PDF/A-conforming files must This makes it possible to differenti- stantly on all colour printers. operate a clause on metadata. ate two levels of PDF/A: a (PDF/A- To do this, colour definitions un- The Portable Document Format 1a) and b (PDF/A-1b). related to the equipment are saved makes it possible to save graphic in the file, which are only conver- displays in different representations Level B is important ted when printing. simultaneously. This means an for archiving Simple and safe reproduction can improved display on different be prevented through protective screens (PC or Handheld or PDA) Level B deals mainly with the mechanism, compressions and or a user orientation (German or preservation of the external appea- encodings. Therefore, these tech- English) is possible. However, as rance over long periods of time. niques are also prohibited for reproduction is unclear with this To do this, it is necessary that all PDF/A conforming files. method, this function contradicts the information needed for the ISO 19005-1. reproduction is contained in the Frequently, image overlapping in file itself. For example, this con- certain applications can be When using level A, such charac- cerns all texts, graphics, images, specifically used to elicit certain teristics are additionally standardised fonts and colour information. effects for the observer. Examples using level B, which define the
  • 4.
    DOCUMENT MANAGEMENT When this question is answered then the next steps can be taken. It must be clarified which proce- dures guarantee that these minimum requirements are com- plied with. In addition, it must be decided how to proceed with any old stock. And finally, it must be specified who is responsible for inspection and compliance of these Fig. 3: Test logs provide users processes. and IT managers with information about the quality of the data In the meantime, there are now stock. (Fig.: Seal Systems AG, countless software tools for Röttenbach) creating PDF files. The most well- known is Acrobat from Adobe. As well as many converting applications from third-party suppliers, there are a number of applications that make it possible to directly export PDFs. In the properties for content, structure operational practice. Therefore, it future, this should also be possible and semantics. This means there is must be checked whether company for the Office products from the opportunity to be able to standards can also be defined Microsoft. However, investigations re-extract parts or information taking into account practicability show that some PDFs created in from the PDF documents at a later and compatibility with existing this way do not even meet the point in time. Furthermore, this procedures. This takes over defini- standard specification, so definitely level explains how a Unicode font tions from the ISO norm, combi- fall short of the stricter ISO 19005-1. must be dealt with. Work is ning comprehensible instructions already being carried out on the for action which can be used by all In a very small number of cases, expansion of this norm, which is company members. PDF files are created solely within named 19005-2 and is based on the company with an inspected »Adobe PDF Reference 1.6«. Define minimum standards tool. PDF is an exchange format – meaning the probability that con- The past has shown that even in- siderable data stocks stem from PDF/A level A covers other, uncheckable sources is high. dividual industries can agree on a the complete norm standardised comprehension and Business partners, the internet Every international norm is a procedure. and emails are examples of this. compromise between the interest If a company decides on PDF For these reasons, it makes more groups concerned and their as a reliable document format for sense for the standard to be requirements, which can be long-term archiving, then this is inspected by the responsible body contradicting in parts. Existing pro- the next logical question: is every within the archiving organisation. cedures and local regulations PDF allowed or must it satisfy Nowadays, there are test should be taken into account. On certain minimum requirements? programs, with which PDF files the other hand, new technical When answering this question can be inspected for configurable possibilities also shouldn’t be ruled and defining the minimum ISO and company standard compa- out. Maximum specification of all standards, the ISO norm for tibility. The result of an inspection details can lead to unusability in PDF/A can help. is always a confirmation of Fig 4: PDF/A inspections can be integrated into existing Document Management Systems (DMS) and Product Data Management Systems. (Fig.: Seal Systems AG, Röttenbach)
  • 5.
    DOCUMENT MANAGEMENT conformity ora rejection. In the steps can be derived from this. A becoming more powerful and latter case, a qualified analysis part of the data can be corrected, extensive with every new version. should take place so that the another part not. 3D visualisations, form processing, creator can be given targeted digital signatures, change mana- instructions for use. gement and pre-print inspection PDF/A – an archiving format are only parts of the PDF applica- However, an alternative to with a future tion spectrum. The use as an rejection can also be the automated extensively simple exchange format correction of a PDF file to norm If the sources are known, it can suggests itself for use as an archi- conformity. Frequently observed be possible to make a new norm- ving format. The technical incompatibilities, such as missing conforming version available. First requirements here are less but the font embedding, can be corrected experiences in reference supplies legal ones are higher. as a result with the minimum of from industrial customers have With PDF/A, a norm has now Fig 5: The diagram been passed, with which risks and shows the integra- future expense for long-term tion of the PDF/A methods into the archiving can be minimised. SAP document There are tools to generate, inspect management system. and adjust PDF/A files. As a result, (Fig.: Seal Systems the new standard will rapidly be AG, Röttenbach) established as practical alternative. effort. To safeguard processes, the shown that almost no PDF file met question of the time of a con- the PDF/A-1b definition. formity inspection is decisive. The most frequent errors are The first and best time is defini- (in this order) missing metadata, tely the generation process. For no font embedding, colour unknown documents or non- management and protection me- secured generation processes, a chanisms. However, any weaknes- simple checking procedure is supp- ses can be automatically corrected lied on the desktop. Both methods through suitable tools. The make it easier for the parties Portable Document Format is concerned to carry out an inspection but do not force them to do so. Therefore, it is recommen- ded that the manufacturer and operator of Document Management Systems (DMS), *Enterprise Content Management Systems (ECM) and archiving solutions provide a suitable inter- face, through which test methods can be integrated. If this interface is then run by all archiving and converting processes, Fig. 6: The data the PDF/A inspection is obligatory. format PDF/A is Even for existing PDF archives, a classed as a norm, one-off or regular inspection is with which both the recommended. A first run provides risks and predomi- nantly the expense of information about the quality of long-term archiving the data stock. Then subsequent can be minimised.