SCAPE
Johan van der Knijff
Koninklijke Bibliotheek – National Library of the Netherlands
DPC, PDF/A-3 Briefing, Leeds, 13....
Part 1: Embedded files
PDF/A-3: embedding of any file (type)
Key point:
Use of “embedded files” really means
“embedded file streams” = specific data
structure in PDF!
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
EF key
points to embed...
Embedded file stream
32 0 obj
<</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>>
stream
…SVG Data…
endstream
end...
Uses of embedded file streams
File attachments not meant to be rendered by
viewer
File attachment annotation
EmbeddedFiles entry in name dictionary
PDF/A-3
Rendered in/by PDF viewer
Rendition actions
Screen annotations
PDF/A-3
What about inline images?
Not based on “embedded file stream”, but on
“Image XObject” data structure (allows
limited set of pre-defined formats)
Wha...
No impact on content that is meant to be
rendered by PDF viewer
But PDF/A-3’s may contain file of any possible
format as a...
Part 2: JPEG 2000
Supported since PDF/A-2
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/L...
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/L...
ISO 19005-2 (PDF/A-2):
JPEG 2000 support based on subset of JPEG
2000 Part 2 (JPX baseline)
Only Part 1 of the standard (J...
JP2 vs JPX
JP2
JPX
JPEG 2000 Part 1:
Basic still image format
JPEG 2000 Part 2:
= JP2 + assorted
advanced stuff …
Fragmented codestreams
Allowed in JPX Baseline!
OS PDF viewers – JPEG 2000 libraries
Ghostscript: OpenJPEG or JasPer
Evince: OpenJPEG
Mupdf: OpenJPEG
Firefox PDF viewer: ...
Is it really a problem?
Fragmented codestreams extremely rare
But why is this feature even allowed in a long-
term archiva...
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project.
The SCAPE project is co-...
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
Upcoming SlideShare
Loading in …5
×

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

868 views

Published on

Johan van der Knijff, the National Library of the Netherlands, presented his views on ‘PDF/A-3 for preservation’ based on notes on embedded files and JPEG2000.
The presentation was given at DPC briefing (http://bit.ly/1b487mD) which introduced and reviewed recent developments with the PDF / A standard, with particular emphasis on PDF/A version 3 published in October 2012. The meeting took place in Leeds on 13 March 2013.

Published in: Technology, Art & Photos
  • Be the first to comment

  • Be the first to like this

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

  1. 1. SCAPE Johan van der Knijff Koninklijke Bibliotheek – National Library of the Netherlands DPC, PDF/A-3 Briefing, Leeds, 13.3.2013 PDF/A-3 for preservation Notes on embedded files and JPEG 2000
  2. 2. Part 1: Embedded files PDF/A-3: embedding of any file (type)
  3. 3. Key point: Use of “embedded files” really means “embedded file streams” = specific data structure in PDF!
  4. 4. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj
  5. 5. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj EF key points to embedded file stream
  6. 6. Embedded file stream 32 0 obj <</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>> stream …SVG Data… endstream endobj
  7. 7. Uses of embedded file streams
  8. 8. File attachments not meant to be rendered by viewer
  9. 9. File attachment annotation EmbeddedFiles entry in name dictionary PDF/A-3
  10. 10. Rendered in/by PDF viewer
  11. 11. Rendition actions Screen annotations PDF/A-3
  12. 12. What about inline images?
  13. 13. Not based on “embedded file stream”, but on “Image XObject” data structure (allows limited set of pre-defined formats) What about inline images?
  14. 14. No impact on content that is meant to be rendered by PDF viewer But PDF/A-3’s may contain file of any possible format as an attachment Embedded files wrap-up:
  15. 15. Part 2: JPEG 2000 Supported since PDF/A-2
  16. 16. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj
  17. 17. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj Identifies object as JPEG 2000 image
  18. 18. ISO 19005-2 (PDF/A-2): JPEG 2000 support based on subset of JPEG 2000 Part 2 (JPX baseline) Only Part 1 of the standard (JP2) commonly used for archival applications!
  19. 19. JP2 vs JPX JP2 JPX JPEG 2000 Part 1: Basic still image format JPEG 2000 Part 2: = JP2 + assorted advanced stuff …
  20. 20. Fragmented codestreams Allowed in JPX Baseline!
  21. 21. OS PDF viewers – JPEG 2000 libraries Ghostscript: OpenJPEG or JasPer Evince: OpenJPEG Mupdf: OpenJPEG Firefox PDF viewer: built-in decoder  None of these libraries support fragmented codestreams!
  22. 22. Is it really a problem? Fragmented codestreams extremely rare But why is this feature even allowed in a long- term archival format? OS support of JPEG 2000 in general remains problematic
  23. 23. #SCAPEProject http://www.scape-project.eu This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). Funding

×