Turning the Page on Digital Content


Published on

Within the Islandora context there have been a number of ways paged content has been handled. One of the simplest approaches to paged content is to store it as a PDF document. This continues to be a viable option in Islandora and is supported by a number of solution packs including the PDF Solution Pack. A more sophisticated and preservation friendly approach to paged content is to treat pages as individual digital objects that are related to a parent object. The Book Solution Pack uses this model and it can be repurposed to satisfy many paged content models including journal content by modifying the metadata required. For instance, the Newspaper Solution Pack shares much of the same code as the Book Solution pack, but metadata requirements and the viewer are different.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Turning the Page on Digital Content

  1. 1. Turning the Page on Digital Content David Wilcox (dgi) & Kirsta Stapelfeldt (Islandora) Open Repositories 2013
  2. 2. Outline •  Content Models •  Role of Metadata •  Preparing Content for Ingest •  Derivative Creation •  Display
  3. 3. Content Models •  Book/Monographs/Journals/Periodicals/ Newspapers •  Formats: tiff, jpeg, jp2000, pdf(pdf/a) o  PDF is stored as single continuous object o  Books and Periodicals stored atomistically
  4. 4. RDF Statements Book Object Page Objects
  5. 5. Book Object: Datastreams RELS-EXT RDF statements connecting book to collection MODS MODS metadata DC Dublin Core metadata TN Display Thumbnail PDF (Optional) Optional PDF can be generated and stored at the book level of all pages
  6. 6. Page Object: Datastreams RELS-EXT RDF statements connecting pages to book and declaring the order of pages MODS MODS metadata DC Dublin Core Metadata TN Display Thumbnail OBJ TIFF representing page JP2 JPEG 2000 JPG Display JPEG (for reader) OCR Text (generated or uploaded) HOCR Coordinate data for generated text only PDF (Optional) PDF for single page can be generated and stored with the object
  7. 7. Management functions for book pages •  Reordering, deletion, replacement (of object or derivatives)
  8. 8. Approaches to Metadata •  Default is MODS and DC •  Ability to add different metadata at Book & Page level •  Ability to add encoded text stream (TEI and HOCR) o  Syncing issues o  TEI schema •  Next: How is content created and managed? (Interface Tour)
  9. 9. Single Page Ingest
  10. 10. Simple Batch Ingest
  11. 11. Advanced Batch Ingest
  12. 12. Derivative Generation •  Kakadu > JP2 •  ImageMagick > JPG •  Ghostscript > PDF •  Tesseract > OCR/hOCR
  13. 13. Displaying Content: Changes in Islandora 7 •  Greater generalization •  Deprecation of the google reader viewer and IIV •  Viewers packaged as separate modules
  14. 14. Displaying Content: Changes in Islandora 7
  15. 15. Sample Projects (discoverygarden) University of Manitoba http://digitalcollections.lib.umanitoba.ca CalTech http://caltech.discoverygarden.ca Williams College http://unbound.williams.edu
  16. 16. Sample Projects (UPEI) The Island Magazine http://vre2.upei.ca/islandmag PEI Legislative Documents Online http://peildo.ca/ Prince Edward Island Magazine http://vre2.upei.ca/peimagazine/ The Charlottetown Guardian http://newspapers.vre.upei.ca/
  17. 17. Contact Us David Wilcox david@discoverygarden.ca Kirsta Stapelfeldt kstapelfeldt@upei.ca