Recommendations for digital newspaper descriptive metadata specification based on the National Digital Newspaper Program (NDNP) technical specification but extended for beyond NDNP uses.
2. Working Group Membership
● Chair: Karen Estlund (Pennsylvania/Oregon)
● Luis Baquera (California)
● Brian Geiger (California)
● Mark Phillips (Texas)
● Shawn Schollmeyer (Washington)
● Kopana Terry (Kentucky)
● Laura Weakly (Nebraska)
● Eric Weig (Kentucky)
● Frederick Zarndt (Independent)
3. Activities
1. Metadata Application Profile
a. Functional Requirements
b. Data Model
c. Metadata Schema
2. File Formats & Resolution Recommendations
3. Directory Structure Recommendations
4. Backup & Storage Recommendations
https://sites.google.com/site/digitalnewspaperspractices/technical-specifications
5. Functional Requirements
1. Newspapers should be retrieved based on issues
2. Items may be sorted and retrieved by date of issue
3. Multiple editions for particular issues may be related to an issue
4. Aggregated and common titles can be used to retrieve user-friendly
results beyond the serials catalog record for a title
5. The model must use the NDNP model as a baseline
6. Identifiers should be present to correspond with additional metadata
resources whenever possible
7. Newspaper content should be retrieval based on copyright associated
with the work at an issue level
8. Full-text searching is assumed and not represented in the descriptive
model
6. Mandatory Metadata Properties
Digital Responsible Institution Edition Order
[at least one identifier]:
● LCCN
● ISSN
● OCLC
● Local Identifier
Issue Date
Title Rights
Publication Location
7. Metadata Properties Added to Profile
Digital Responsible Institution* Common Title / Curated Title
ISSN Common Title / Curated ID
OCLC Number Rights*
Local Identifier Language
[Original Object Information]
8. File Format & Resolution Recommendations
1. Microfilm
a. 300-400 ppi
b. 8-bit grayscale
2. Paper
a. 250 ppi
b. 8-bit grayscale (even for
color)
3. Born Digital
a. PDF -> PDF/A
b. PDF -> TIFF images
c. Websites -> harvest to
WARC
Preservation Formats:
● TIFF 6.0 Uncompressed
● PDF/A Flavor of your choice
● WARC
Access Formats:
● JP2
● JPEG, and/or
● PDF
9. Directory Structure Recommendations
● University of Kentucky Libraries
○ [collection uniquecode]/[lccn]/issues/[YYYY]/[unqiuecode]
[YYYYMMDDED]/
○ lvc/sn86069643/issues/2012/lvc2012030101/
● Center for Bibliographical Studies and Research (CBSR) at the University of
California, Riverside
○ [batch_directory]/[pub_code]/YYYYMMDD[_EE]/
○ batch_curiv_eagle/SFC/19101226/
10. Storage & Backup Recommendations
1. External Hard Drives
2. Networked Local Server
3. Engineering Backup Servers
4. Cloud Hosting
Preservation Best Practices
Following preservation best practices for digital newspaper content is encouraged.
More information about digital preservation best practices is available from the
Library of Congress: http://www.loc.gov/preservation/.
11. Scripts & Software
Scripts and software to help with processing, hosting, or preserving digital
newspapers. For additional resources, see PaperVault "Tools for Working with
Digital News".
● Open Source Newspaper Viewer, chronam/LC Newspaper Viewer: https:
//github.com/LibraryOfCongress/chronam
● Open Source JP2 Image Server, RAIS: https://github.com/uoregon-
libraries/rais-image-server
● ALTO-like XML
○ PDF2ALTO, https://github.com/cokernel/pdf2alto
○ PDF to Text, https://github.com/uoregon-libraries/pdftotext
● PDFs to NDNP-like technical specification, https://github.com/uoregon-
libraries/pdf-to-chronam