a standard for document archiving


  1. 1. PDF/A – A standard fordocument archivingDipl. Inf. Reinhold Müller-MeernachRöttenbach 006 2/2Dr. Uwe Wächter No.Roßdorf SEAL Systems info@sealsystems.com www.sealsystems.com
  2. 2. DOCUMENT MANAGEMENTPDF/A – A standard fordocument archivingDipl. Inf. Reinhold Müller-MeernachRöttenbachDr. Uwe WächterRoßdorf The »leader of the pack« TIFF/G4 has got competition. With PDF/A, a new standard for long-term archiving of electronic documents has now been defined. Checks on existing document archives show that a large amount of the PDF files archived there don’t even meet the minimum requirements of the new standard. But this is no longer a reason to panic.
  3. 3. DOCUMENT MANAGEMENT Paper archives have been andare being replaced by digital stora-ge. The number of electronicallycreated documents is growing con-stantly. For long-term archiving ofthese documents, standards arebeneficial if the well-defined repro-ducibility and distribution is to be Fig. 1: Investigations show thatsupported over a long period of almost no PDF files in existingtime. The monochrome grid format archives conform to PDF/A. (Fig.: Seal Systems AG,TIFF/G4 has been the de facto Röttenbach)standard for more than ten years.For text-laden documents (such as References to external sources, of this are the effects of trans-those from Office applications), the such as further files, images, web- parency, colour mixing and back-Portable Document Format, PDF sites or external fonts contradict ground stamping. These characte-for short, from Adobe has become the PDF/A norm. ristics can not be represented 1:1established as an application-neu- with many PDF generating proces- An especially important charac-tral exchange format. With PDF/A, ses. Therefore, with PDF/A, this teristic of PDF/A is the embeddingthere is now a standard, that must be avoided. of fonts. Only this can ensure thatestablishes a part of the PDF speci- a document can be printed infication to make PDF files parti- exactly the same way after many Secure archiving in linecularly suitable for archiving. years, without having to use font with the norm The ISO Norm 19005-1 is based definitions on a computer or prin-on the »PDF Reference 1.4« from ter. PDF can also demonstrate its Secure archiving in line with theAdobe. It makes PDF 1.4 more advantage over TIFF G4 through norm means that the saved filesprecise and defines whether its its colour displays. However, this can then still be used if the admi-properties are obligatory, re- only conforms to standards if the nistration system corrupts. There- Fig. 2: With test and correction procedures for PDF/A, data stocks can be viewed and modified as the case may need. (Fig.: Seal Systems AG, Röttenbach)commended, limited or prohibited. PDF file can also be printed in- fore, PDF/A-conforming files mustThis makes it possible to differenti- stantly on all colour printers. operate a clause on metadata.ate two levels of PDF/A: a (PDF/A- To do this, colour definitions un- The Portable Document Format1a) and b (PDF/A-1b). related to the equipment are saved makes it possible to save graphic in the file, which are only conver- displays in different representations Level B is important ted when printing. simultaneously. This means an for archiving Simple and safe reproduction can improved display on different be prevented through protective screens (PC or Handheld or PDA) Level B deals mainly with the mechanism, compressions and or a user orientation (German orpreservation of the external appea- encodings. Therefore, these tech- English) is possible. However, asrance over long periods of time. niques are also prohibited for reproduction is unclear with thisTo do this, it is necessary that all PDF/A conforming files. method, this function contradictsthe information needed for the ISO 19005-1.reproduction is contained in the Frequently, image overlapping infile itself. For example, this con- certain applications can be When using level A, such charac-cerns all texts, graphics, images, specifically used to elicit certain teristics are additionally standardisedfonts and colour information. effects for the observer. Examples using level B, which define the
  4. 4. DOCUMENT MANAGEMENT When this question is answered then the next steps can be taken. It must be clarified which proce- dures guarantee that these minimum requirements are com- plied with. In addition, it must be decided how to proceed with any old stock. And finally, it must be specified who is responsible for inspection and compliance of these Fig. 3: Test logs provide users processes. and IT managers with information about the quality of the data In the meantime, there are now stock. (Fig.: Seal Systems AG, countless software tools for Röttenbach) creating PDF files. The most well- known is Acrobat from Adobe. As well as many converting applications from third-party suppliers, there are a number of applications that make it possible to directly export PDFs. In theproperties for content, structure operational practice. Therefore, it future, this should also be possibleand semantics. This means there is must be checked whether company for the Office products fromthe opportunity to be able to standards can also be defined Microsoft. However, investigationsre-extract parts or information taking into account practicability show that some PDFs created infrom the PDF documents at a later and compatibility with existing this way do not even meet thepoint in time. Furthermore, this procedures. This takes over defini- standard specification, so definitelylevel explains how a Unicode font tions from the ISO norm, combi- fall short of the stricter ISO 19005-1.must be dealt with. Work is ning comprehensible instructionsalready being carried out on the for action which can be used by all In a very small number of cases,expansion of this norm, which is company members. PDF files are created solely withinnamed 19005-2 and is based on the company with an inspected»Adobe PDF Reference 1.6«. Define minimum standards tool. PDF is an exchange format – meaning the probability that con- The past has shown that even in- siderable data stocks stem from PDF/A level A covers other, uncheckable sources is high. dividual industries can agree on a the complete norm standardised comprehension and Business partners, the internet Every international norm is a procedure. and emails are examples of this.compromise between the interest If a company decides on PDF For these reasons, it makes moregroups concerned and their as a reliable document format for sense for the standard to berequirements, which can be long-term archiving, then this is inspected by the responsible bodycontradicting in parts. Existing pro- the next logical question: is every within the archiving organisation.cedures and local regulations PDF allowed or must it satisfy Nowadays, there are testshould be taken into account. On certain minimum requirements? programs, with which PDF filesthe other hand, new technical When answering this question can be inspected for configurablepossibilities also shouldn’t be ruled and defining the minimum ISO and company standard compa-out. Maximum specification of all standards, the ISO norm for tibility. The result of an inspectiondetails can lead to unusability in PDF/A can help. is always a confirmation of Fig 4: PDF/A inspections can be integrated into existing Document Management Systems (DMS) and Product Data Management Systems. (Fig.: Seal Systems AG, Röttenbach)
  5. 5. DOCUMENT MANAGEMENTconformity or a rejection. In the steps can be derived from this. A becoming more powerful andlatter case, a qualified analysis part of the data can be corrected, extensive with every new version.should take place so that the another part not. 3D visualisations, form processing,creator can be given targeted digital signatures, change mana-instructions for use. gement and pre-print inspection PDF/A – an archiving format are only parts of the PDF applica- However, an alternative to with a future tion spectrum. The use as anrejection can also be the automated extensively simple exchange formatcorrection of a PDF file to norm If the sources are known, it can suggests itself for use as an archi-conformity. Frequently observed be possible to make a new norm- ving format. The technicalincompatibilities, such as missing conforming version available. First requirements here are less but thefont embedding, can be corrected experiences in reference supplies legal ones are higher.as a result with the minimum of from industrial customers have With PDF/A, a norm has now Fig 5: The diagram been passed, with which risks and shows the integra- future expense for long-term tion of the PDF/A methods into the archiving can be minimised. SAP document There are tools to generate, inspect management system. and adjust PDF/A files. As a result, (Fig.: Seal Systems the new standard will rapidly be AG, Röttenbach) established as practical alternative.effort. To safeguard processes, the shown that almost no PDF file metquestion of the time of a con- the PDF/A-1b definition.formity inspection is decisive. The most frequent errors are The first and best time is defini- (in this order) missing metadata,tely the generation process. For no font embedding, colourunknown documents or non- management and protection me-secured generation processes, a chanisms. However, any weaknes-simple checking procedure is supp- ses can be automatically correctedlied on the desktop. Both methods through suitable tools. Themake it easier for the parties Portable Document Format isconcerned to carry out aninspection but do not forcethem to do so. Therefore, it is recommen-ded that the manufacturerand operator of DocumentManagement Systems (DMS),*Enterprise Content ManagementSystems (ECM) and archivingsolutions provide a suitable inter-face, through which test methodscan be integrated. If this interface is then run by allarchiving and converting processes, Fig. 6: The datathe PDF/A inspection is obligatory. format PDF/A isEven for existing PDF archives, a classed as a norm,one-off or regular inspection is with which both therecommended. A first run provides risks and predomi- nantly the expense ofinformation about the quality of long-term archivingthe data stock. Then subsequent can be minimised.