Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CHITO N. ANGELES
Are there standards for digitization or digital archiving? 
Yes, but limited to certain aspects only.
ISO/TR 13028:2010 -Information and documentation -Implementation guidelines for digitization of records. 
Not applicable...
ISO/TR 19005-1:2005; ISO/TR 19005- 2:2011; ISO/TR 19005-3: 2012, (underdevelopment) -Document management -Electronic docu...
Unlike preservation microfilming and photocopying, there are no formal standards that govern the capture, processing, and...
Also known as imaging or scanning, is the means of converting hard-copy, or non- digital, records into digital format. 
...
A process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectua...
Born Digital -Digital materials which are created and retained in digital form. 
May or may not have a non-digital equiva...
Digital Repository / Archive -a digital repository is where digital content, assets, are stored and can be searched and re...
Master-A faithful digital reproduction of a document, optimized for longevity and for production of a range of delivery ve...
Derivative-an image created from the master image, through some kind of image editing process to create a user or working ...
Digital images-electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed ...
Resolution -a measure of the ability to capture detail in the original work. 
The spatial frequency at which a digital im...
Pixel Dimensions -the horizontal and vertical measurements of an image expressed in pixels. 
May be determined by multipl...
Bit Depth-determined by the number of bits used to define each pixel. 
The greater the bit depth, the greater the number ...
Bit Depth 
Abitonalimageis represented by pixels consisting of 1 bit each, which can represent two tones (typically black...
Bit Depth 
Acolorimageis typically represented by a bit depth ranging from 8 to 24 or higher. 
With a 24-bit image, the ...
File Size-calculated by multiplying the surface area of a document (height x width) to be scanned by the bit depth and the...
File Size 
Example: Compute the file size of a US-Letter size page captured in 8-bit Grayscale at 100dpi. 
FS = (8.5 x 1...
File Size 
If the pixel dimensions are given, multiply them by each other and the bit depth to determine the number of bi...
File Size 
Example: Compute the file size of a 24-bit image captured with a digital camera with pixel dimensions of 2,048...
Compression-algorithms designed to reduce the size of the image for storage or transmission. 
Losslessschemes (e.g., ITU-...
File Formats-consist of both the bits that comprise the image and header information on how to read and interpret the file...
Optical Character Recognition(OCR) -a technology that enables you to convert different types of documents, such as scanned...
Quality (usability, functionality) 
Persistence (long-term access) 
Interoperability (e.g., across platforms and softwa...
Master copies should be created to the highest technical standards achievable. 
Image formats should be open-source (non...
Digitize an original or first generation (i.e., print rather than microfilm) of the source material to achieve the best q...
Prior to digitization, consideration of third party copyright or other constraints inherent in the record should be resol...
Tagged Image File Format(TIFF) 
Extensions: .tif, .tiff 
Bit-depths: 1-bit bitonal; 4-or 8-bit. grayscale or palette col...
Joint Photographic Expert Group(JPEG) / JPEG File Interchange Format (JFIF) 
Extensions: .jpg, .jpeg, .jif, .jfif 
Bit-d...
JP2-JPX/ JPEG 2000 
Extensions: .jp2, .jpx, .j2k, .j2c 
Bit-depths: supports up to 214 channels, each with 1-38 bits; gr...
Portable Document Format (PDF) 
Extension: .pdf 
Bit-depths: 4-bit grayscale; 8-bit color; up to 64-bit color support. 
...
DjVu, pronounced “day·zha·voo” 
Extension: .djvu 
Bit-depths: 1-bit bitonal, 4-to 8-bit grayscale; 24-bit color support....
DjVu 
High quality image compression technique: 
◦Scanned bitonal: 300dpi: 5-40K per page (3-10 times better than TIFF/G4...
Image Masters 
◦Preservation / Archive Copy 
◦Uncompressed 
◦Highest possible quality recommended 
Derivatives 
◦Display...
Image Masters 
◦TIFF 
◦JPEG (if using digital cameras) 
Derivatives / Deliverables 
◦Text/ Documents: PDF, DjVu 
◦Photog...
Black and White 
◦File Format: TIFF 
◦Compression: Uncompressed or Lossless compressed using CCITT Group 4 (ITU-T6) 
◦Bit...
Color 
◦File Format: TIFF 
◦Compression: Uncompressed or Lossless Compressed using LZW or JPEG2000 
◦Bit Depth: 300dpi, 2...
Thumbnail 
◦File Format: JPEG 
◦Compression: Lossy 
◦Resolution: 72-100 dpi 
View / Service copy 
◦File Format: JPEG / P...
Flatbed Scanner 
◦Best known and largest selling scanner
Sheet Feed Scanner 
◦Use the same basic technology as flatbeds, but maximize throughput, usually at the expense of qualit...
Overhead Scanner 
◦High speed book scanner. 
◦Sometimes referred to as “Planetary scanner” 
◦Bound volumes can be placed ...
V-Shaped Book Scanner 
◦Uses Digital SLR Cameras and a unique v-shaped, auto-adjusting book cradle and platen to capture ...
Image Capture and Processing 
◦IrfanView(Freeware) 
Image capture, conversion, processing 
◦Adobe Acrobat (Proprietary) ...
Image Capture 
Image Processing 
Quality Control 
Delivery 
Storage and Backup
Document(s) or other materials are captured in digital form using a scanner or digital camera. 
Guidelines and Procedure...
Image editing (if necessary) 
◦Compression of files, sharpening of images, deskewing, image rotation, cropping, deleting ...
What to look for when checking digital images for quality: 
◦Missing pages. 
◦Incorrect order of pages. 
◦Pages of differ...
What to look for when checking digital images for quality: 
◦Image in wrong mode or bit-depth 
◦Overall light problems (e...
What to look for when checking digital images for quality: 
◦Image not centeredor skewed 
◦Incomplete or cropped images 
...
The process of getting the scanned images to the user through computer networks/Web, monitors, and printers. 
Delivery M...
Recommended Digital Repository software: 
◦Eprints 
◦Dspace 
◦Greenstone
Strategies for storage and backup may include: 
◦Dedicated server or shared storage solution. 
Database Systems 
File-b...
Upcoming SlideShare
Loading in …5
×

Standards and procedure in digitization and digital preservation

694 views

Published on

  • Be the first to comment

Standards and procedure in digitization and digital preservation

  1. 1. CHITO N. ANGELES
  2. 2. Are there standards for digitization or digital archiving? Yes, but limited to certain aspects only.
  3. 3. ISO/TR 13028:2010 -Information and documentation -Implementation guidelines for digitization of records. Not applicable to: technical specifications for the digital capture of records; technical specifications for the long-term preservation of digital records; or digitization of existing archival holdings for preservation purposes, etc.
  4. 4. ISO/TR 19005-1:2005; ISO/TR 19005- 2:2011; ISO/TR 19005-3: 2012, (underdevelopment) -Document management -Electronic document file format for long-term preservation. Specifies how to use the Portable Document Format (PDF) for long-term preservation of electronic documents. Standard is known as PDF/A.
  5. 5. Unlike preservation microfilming and photocopying, there are no formal standards that govern the capture, processing, and storage of digital images. There are, however, a number of projects and publications that have set forth best practices for creating high-quality digital images, access systems, and storage systems.
  6. 6. Also known as imaging or scanning, is the means of converting hard-copy, or non- digital, records into digital format. Hard-copy or non-digital records include audio, visual, image or text. Digitization may also be undertaken by taking digital photographs of the source records, where appropriate. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  7. 7. A process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein. A more precise definition is: the storage, maintenance, and accessibility of a digital object over the long term, usually as a consequence of applying one or more digital preservation strategies. These strategies may include technology preservation, technology emulation or data migration. Source: The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (2002).
  8. 8. Born Digital -Digital materials which are created and retained in digital form. May or may not have a non-digital equivalent. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  9. 9. Digital Repository / Archive -a digital repository is where digital content, assets, are stored and can be searched and retrieved for later use. A repository supports mechanisms to import, export, identify, store and retrieve digital assets. Putting digital content into a repository enables staff and institutions to then manage and preserve it, and therefore derive maximum value from it. Digital repositories may include research outputs and journal articles, theses, elearningobjects and teaching materials or research data. Source: Digital Repositories: Helping universities and colleges. JISC, August 2005.
  10. 10. Master-A faithful digital reproduction of a document, optimized for longevity and for production of a range of delivery versions (derivatives). Masters are captured at the highest practicable quality or resolution and stored for long-term usage. Typically, masters are stored in an off-line mode on tape or CD and are accessed only for the production of derivative images. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  11. 11. Derivative-an image created from the master image, through some kind of image editing process to create a user or working copy. The process usually involves a loss of information to reduce the size by sampling it to a lower resolution, using lossycompression techniques, or altering an image using image processing techniques. Typically, derivatives are made for purposes such as web access, including “thumbnail” images, or as “reference” or “service” images that should fit completely within an average monitor. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  12. 12. Digital images-electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed texts, and artwork. The digital image is sampled and mapped as a grid of dots or picture elements (pixels). Each pixel is assigned a tonal value (black, white, shades of gray or color), which is represented in binary code (zeros and ones).
  13. 13. Resolution -a measure of the ability to capture detail in the original work. The spatial frequency at which a digital image is sampled (the sampling frequency) is often a good indicator of resolution. Dots-per-inch (dpi) or pixels-per-inch (ppi) are common and synonymous terms used to express resolution for digital images.
  14. 14. Pixel Dimensions -the horizontal and vertical measurements of an image expressed in pixels. May be determined by multiplying both the width and the height by the dpi. Example: an 8" x 10" document scanned at 300 dpi has thepixel dimensions of 2,400 pixels (8" x 300 dpi) by 3,000 pixels (10" x 300 dpi).
  15. 15. Bit Depth-determined by the number of bits used to define each pixel. The greater the bit depth, the greater the number of tones (grayscale or color) that can be represented. Digital images may be produced in black and white (bitonal), grayscale, or color.
  16. 16. Bit Depth Abitonalimageis represented by pixels consisting of 1 bit each, which can represent two tones (typically black and white), using the values 0 for black and 1 for white or vice versa. Agrayscaleimageis composed of pixels represented by multiple bits of information, typically ranging from 2 to 8 bits or more.
  17. 17. Bit Depth Acolorimageis typically represented by a bit depth ranging from 8 to 24 or higher. With a 24-bit image, the bits are often divided into three groupings: 8 for red, 8 for green, and 8 for blue. Combinations of those bits are used to represent other colors. A 24-bit image offers 16.7 million (224) color values.
  18. 18. File Size-calculated by multiplying the surface area of a document (height x width) to be scanned by the bit depth and the dpi2. Because image file size is represented in bytes, which are made up of 8 bits, divide this figure by 8. Formula 1 for File Size FS = (height x width x bit depth x dpi2) / 8
  19. 19. File Size Example: Compute the file size of a US-Letter size page captured in 8-bit Grayscale at 100dpi. FS = (8.5 x 11 x 8 x 1002)/8 FS = 935,000 bytes.
  20. 20. File Size If the pixel dimensions are given, multiply them by each other and the bit depth to determine the number of bits in an image file. Formula 2 for File Size FS=(pixel dimensions x bit depth) / 8
  21. 21. File Size Example: Compute the file size of a 24-bit image captured with a digital camera with pixel dimensions of 2,048 x 3,072. FS = (2048 x 3072 x 24)/8 FS = 18,874,368 bytes.
  22. 22. Compression-algorithms designed to reduce the size of the image for storage or transmission. Losslessschemes (e.g., ITU-T6) abbreviate the binary code without discarding any information, so that when the image is "decompressed" it is bit for bit identical to the original. Most often used with bitonalscanning of textual material. Lossyschemes (e.g., JPEG) utilize a means for averaging or discarding the least significant information, based on an understanding of visual perception.Typically used with tonal images.
  23. 23. File Formats-consist of both the bits that comprise the image and header information on how to read and interpret the file. File formats vary in terms of resolution, bit- depth, color capabilities, and support for compression and metadata.
  24. 24. Optical Character Recognition(OCR) -a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data. Source: http://finereader.abbyy.com/about_ocr/whatis_ocr/
  25. 25. Quality (usability, functionality) Persistence (long-term access) Interoperability (e.g., across platforms and software environments) Storage Space (file size) Storage Hardware Storage Media (e.g., DVDs, CDs)
  26. 26. Master copies should be created to the highest technical standards achievable. Image formats should be open-source (non proprietary), have published technical specifications available in the public domain. Image formats should be widely supported by many software applications and operating systems.
  27. 27. Digitize an original or first generation (i.e., print rather than microfilm) of the source material to achieve the best quality image possible. Create backup copies of all files on servers and storage media (e.g., DVDs) and have an off-site backup strategy. Create meaningful metadata for image files or collections.
  28. 28. Prior to digitization, consideration of third party copyright or other constraints inherent in the record should be resolved. OCR should be performed on all digital reproductions where the content is primarily textual and computer processed. Collections that are photographic in nature and those not computer processed need not require OCR. Plan for future technological developments and migration.
  29. 29. Tagged Image File Format(TIFF) Extensions: .tif, .tiff Bit-depths: 1-bit bitonal; 4-or 8-bit. grayscale or palette color; up to 64-bit color. Compression: Uncompressed ◦Lossless: ITU-T.6, LZW, etc. ◦Lossy: JPEG Standard/ Proprietary: De facto standard. Web Support: plug-in or external application. Supports multiple images/file (multi-page).
  30. 30. Joint Photographic Expert Group(JPEG) / JPEG File Interchange Format (JFIF) Extensions: .jpg, .jpeg, .jif, .jfif Bit-depths: 8-bit grayscale; 24-bit color. Compression: Lossless; Lossy: JPEG. Standard/ Proprietary: JPEG: ISO 10918-1/2; JFIF: de Facto Standard. Web Support: Native since MicrosoftInternet Explorer 2, Netscape Navigator 2.
  31. 31. JP2-JPX/ JPEG 2000 Extensions: .jp2, .jpx, .j2k, .j2c Bit-depths: supports up to 214 channels, each with 1-38 bits; gray or color. Compression: Uncompressed ◦Lossless/Lossy: Wavelet. Standard/ Proprietary: JPEG: ISO/IEC 15444 parts 1-6, 8-11. Web Support: Plug-in.
  32. 32. Portable Document Format (PDF) Extension: .pdf Bit-depths: 4-bit grayscale; 8-bit color; up to 64-bit color support. Compression: Uncompressed ◦Lossless: ITU-T.6, LZW, JBIG ◦Lossy: JPEG Standard/ Proprietary: De facto standard. Web Support: Plug-in or external application. Contains OCR text layer.
  33. 33. DjVu, pronounced “day·zha·voo” Extension: .djvu Bit-depths: 1-bit bitonal, 4-to 8-bit grayscale; 24-bit color support. Compression: Lossless: JB2, IW44; Lossy. Standard/ Proprietary: Emerging standard. Web Support: Plug-in or external application. Supports multiple images/file (multi-page). Contains OCR text layer.
  34. 34. DjVu High quality image compression technique: ◦Scanned bitonal: 300dpi: 5-40K per page (3-10 times better than TIFF/G4). ◦5-10 times better than thanJPEG or PDF
  35. 35. Image Masters ◦Preservation / Archive Copy ◦Uncompressed ◦Highest possible quality recommended Derivatives ◦Display / Viewing / Reading ◦Printing ◦Thumbnails
  36. 36. Image Masters ◦TIFF ◦JPEG (if using digital cameras) Derivatives / Deliverables ◦Text/ Documents: PDF, DjVu ◦Photographs: PNG, DjVu
  37. 37. Black and White ◦File Format: TIFF ◦Compression: Uncompressed or Lossless compressed using CCITT Group 4 (ITU-T6) ◦Bit Depth: 600dpi, bitonal Grayscale ◦File Format: TIFF ◦Compression: Uncompressed or Lossless compressed using LZW or JPEG2000 ◦Bit Depth: 300dpi, 8-bit grayscale
  38. 38. Color ◦File Format: TIFF ◦Compression: Uncompressed or Lossless Compressed using LZW or JPEG2000 ◦Bit Depth: 300dpi, 24-bit color
  39. 39. Thumbnail ◦File Format: JPEG ◦Compression: Lossy ◦Resolution: 72-100 dpi View / Service copy ◦File Format: JPEG / PDF / DjVu ◦Compression: Lossy ◦Resolution: 72-100 dpi Print Copy (PDF/DjVu) ◦File Format: PDF / DjVu ◦Compression: Lossy ◦Resolution: 100-150 DPI
  40. 40. Flatbed Scanner ◦Best known and largest selling scanner
  41. 41. Sheet Feed Scanner ◦Use the same basic technology as flatbeds, but maximize throughput, usually at the expense of quality. ◦Designed for high-volume scanning
  42. 42. Overhead Scanner ◦High speed book scanner. ◦Sometimes referred to as “Planetary scanner” ◦Bound volumes can be placed face up for scanning
  43. 43. V-Shaped Book Scanner ◦Uses Digital SLR Cameras and a unique v-shaped, auto-adjusting book cradle and platen to capture sharp images at up to 700 pages an hour. ◦Natively captures flat images. No need for page curvature correction.
  44. 44. Image Capture and Processing ◦IrfanView(Freeware) Image capture, conversion, processing ◦Adobe Acrobat (Proprietary) PDF creation, conversion, processing OCR Watermarks ◦Document Express Editor (Proprietary) DjVucreation, conversion, processing OCR
  45. 45. Image Capture Image Processing Quality Control Delivery Storage and Backup
  46. 46. Document(s) or other materials are captured in digital form using a scanner or digital camera. Guidelines and Procedures: ◦Pre-scanning Preparing item level inventory list ◦Copyright Statement Should accompany each digital file. If accessed from the web, copyright statement can be displayed on the website (if the same rights apply to all items on the site).
  47. 47. Image editing (if necessary) ◦Compression of files, sharpening of images, deskewing, image rotation, cropping, deleting and reordering pages. Optical Character Recognition Creating Derivatives Adding Watermarks Adding Security (e.g., restrictions on copying, printing, or extraction, and password protection) Creation of metadata describing the scanned materials.
  48. 48. What to look for when checking digital images for quality: ◦Missing pages. ◦Incorrect order of pages. ◦Pages of different sizes. ◦Readability of text. ◦Black or white areas on some parts of the page that is covering the content. ◦Image not the correct size ◦Image in wrong resolution ◦Image in wrong file format
  49. 49. What to look for when checking digital images for quality: ◦Image in wrong mode or bit-depth ◦Overall light problems (e.g., too dark) ◦Loss of detail in highlights or shadows ◦Poor contrasts ◦Uneven tone or flares ◦Missing scan lines or dropped-out pixels ◦Lack of sharpness ◦Excessive sharpening ◦Image in wrong orientation
  50. 50. What to look for when checking digital images for quality: ◦Image not centeredor skewed ◦Incomplete or cropped images ◦Excessive noise (see dark areas) ◦Misaligned colorchannels ◦Image processing and scanner artifacts(e.g., extraneous lines, noise, banding)
  51. 51. The process of getting the scanned images to the user through computer networks/Web, monitors, and printers. Delivery Methods ◦Removable Storage Devices ◦Optical Media (CDs, DVDs) ◦Static Web Pages ◦Digital Repositories
  52. 52. Recommended Digital Repository software: ◦Eprints ◦Dspace ◦Greenstone
  53. 53. Strategies for storage and backup may include: ◦Dedicated server or shared storage solution. Database Systems File-based Systems (FTP, WebDav, Shared Folders) ◦Writing the digitized records to magnetic tape. ◦Writing the digitized records to optical media (e.g., CD, DVD).

×