SlideShare a Scribd company logo
1 of 53
Download to read offline
CHITO N. ANGELES
Are there standards for digitization or digital archiving? 
Yes, but limited to certain aspects only.
ISO/TR 13028:2010 -Information and documentation -Implementation guidelines for digitization of records. 
Not applicable to: technical specifications for the digital capture of records; technical specifications for the long-term preservation of digital records; or digitization of existing archival holdings for preservation purposes, etc.
ISO/TR 19005-1:2005; ISO/TR 19005- 2:2011; ISO/TR 19005-3: 2012, (underdevelopment) -Document management -Electronic document file format for long-term preservation. 
Specifies how to use the Portable Document Format (PDF) for long-term preservation of electronic documents. 
Standard is known as PDF/A.
Unlike preservation microfilming and photocopying, there are no formal standards that govern the capture, processing, and storage of digital images. 
There are, however, a number of projects and publications that have set forth best practices for creating high-quality digital images, access systems, and storage systems.
Also known as imaging or scanning, is the means of converting hard-copy, or non- digital, records into digital format. 
Hard-copy or non-digital records include audio, visual, image or text. 
Digitization may also be undertaken by taking digital photographs of the source records, where appropriate. 
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
A process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein. 
A more precise definition is: the storage, maintenance, and accessibility of a digital object over the long term, usually as a consequence of applying one or more digital preservation strategies. 
These strategies may include technology preservation, technology emulation or data migration. 
Source: The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (2002).
Born Digital -Digital materials which are created and retained in digital form. 
May or may not have a non-digital equivalent. 
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
Digital Repository / Archive -a digital repository is where digital content, assets, are stored and can be searched and retrieved for later use. 
A repository supports mechanisms to import, export, identify, store and retrieve digital assets. 
Putting digital content into a repository enables staff and institutions to then manage and preserve it, and therefore derive maximum value from it. 
Digital repositories may include research outputs and journal articles, theses, elearningobjects and teaching materials or research data. 
Source: Digital Repositories: Helping universities and colleges. JISC, August 2005.
Master-A faithful digital reproduction of a document, optimized for longevity and for production of a range of delivery versions (derivatives). 
Masters are captured at the highest practicable quality or resolution and stored for long-term usage. 
Typically, masters are stored in an off-line mode on tape or CD and are accessed only for the production of derivative images. 
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
Derivative-an image created from the master image, through some kind of image editing process to create a user or working copy. 
The process usually involves a loss of information to reduce the size by sampling it to a lower resolution, using lossycompression techniques, or altering an image using image processing techniques. 
Typically, derivatives are made for purposes such as web access, including “thumbnail” images, or as “reference” or “service” images that should fit completely within an average monitor. 
Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
Digital images-electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed texts, and artwork. 
The digital image is sampled and mapped as a grid of dots or picture elements (pixels). 
Each pixel is assigned a tonal value (black, white, shades of gray or color), which is represented in binary code (zeros and ones).
Resolution -a measure of the ability to capture detail in the original work. 
The spatial frequency at which a digital image is sampled (the sampling frequency) is often a good indicator of resolution. 
Dots-per-inch (dpi) or pixels-per-inch (ppi) are common and synonymous terms used to express resolution for digital images.
Pixel Dimensions -the horizontal and vertical measurements of an image expressed in pixels. 
May be determined by multiplying both the width and the height by the dpi. 
Example: an 8" x 10" document scanned at 300 dpi has thepixel dimensions of 2,400 pixels (8" x 300 dpi) by 3,000 pixels (10" x 300 dpi).
Bit Depth-determined by the number of bits used to define each pixel. 
The greater the bit depth, the greater the number of tones (grayscale or color) that can be represented. 
Digital images may be produced in black and white (bitonal), grayscale, or color.
Bit Depth 
Abitonalimageis represented by pixels consisting of 1 bit each, which can represent two tones (typically black and white), using the values 0 for black and 1 for white or vice versa. 
Agrayscaleimageis composed of pixels represented by multiple bits of information, typically ranging from 2 to 8 bits or more.
Bit Depth 
Acolorimageis typically represented by a bit depth ranging from 8 to 24 or higher. 
With a 24-bit image, the bits are often divided into three groupings: 8 for red, 8 for green, and 8 for blue. Combinations of those bits are used to represent other colors. 
A 24-bit image offers 16.7 million (224) color values.
File Size-calculated by multiplying the surface area of a document (height x width) to be scanned by the bit depth and the dpi2. 
Because image file size is represented in bytes, which are made up of 8 bits, divide this figure by 8. 
Formula 1 for File Size FS = (height x width x bit depth x dpi2) / 8
File Size 
Example: Compute the file size of a US-Letter size page captured in 8-bit Grayscale at 100dpi. 
FS = (8.5 x 11 x 8 x 1002)/8 
FS = 935,000 bytes.
File Size 
If the pixel dimensions are given, multiply them by each other and the bit depth to determine the number of bits in an image file. 
Formula 2 for File Size FS=(pixel dimensions x bit depth) / 8
File Size 
Example: Compute the file size of a 24-bit image captured with a digital camera with pixel dimensions of 2,048 x 3,072. 
FS = (2048 x 3072 x 24)/8 
FS = 18,874,368 bytes.
Compression-algorithms designed to reduce the size of the image for storage or transmission. 
Losslessschemes (e.g., ITU-T6) abbreviate the binary code without discarding any information, so that when the image is "decompressed" it is bit for bit identical to the original. Most often used with bitonalscanning of textual material. 
Lossyschemes (e.g., JPEG) utilize a means for averaging or discarding the least significant information, based on an understanding of visual perception.Typically used with tonal images.
File Formats-consist of both the bits that comprise the image and header information on how to read and interpret the file. 
File formats vary in terms of resolution, bit- depth, color capabilities, and support for compression and metadata.
Optical Character Recognition(OCR) -a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data. 
Source: http://finereader.abbyy.com/about_ocr/whatis_ocr/
Quality (usability, functionality) 
Persistence (long-term access) 
Interoperability (e.g., across platforms and software environments) 
Storage Space (file size) 
Storage Hardware 
Storage Media (e.g., DVDs, CDs)
Master copies should be created to the highest technical standards achievable. 
Image formats should be open-source (non proprietary), have published technical specifications available in the public domain. 
Image formats should be widely supported by many software applications and operating systems.
Digitize an original or first generation (i.e., print rather than microfilm) of the source material to achieve the best quality image possible. 
Create backup copies of all files on servers and storage media (e.g., DVDs) and have an off-site backup strategy. 
Create meaningful metadata for image files or collections.
Prior to digitization, consideration of third party copyright or other constraints inherent in the record should be resolved. 
OCR should be performed on all digital reproductions where the content is primarily textual and computer processed. Collections that are photographic in nature and those not computer processed need not require OCR. 
Plan for future technological developments and migration.
Tagged Image File Format(TIFF) 
Extensions: .tif, .tiff 
Bit-depths: 1-bit bitonal; 4-or 8-bit. grayscale or palette color; up to 64-bit color. 
Compression: Uncompressed 
◦Lossless: ITU-T.6, LZW, etc. 
◦Lossy: JPEG 
Standard/ Proprietary: De facto standard. 
Web Support: plug-in or external application. 
Supports multiple images/file (multi-page).
Joint Photographic Expert Group(JPEG) / JPEG File Interchange Format (JFIF) 
Extensions: .jpg, .jpeg, .jif, .jfif 
Bit-depths: 8-bit grayscale; 24-bit color. 
Compression: Lossless; Lossy: JPEG. 
Standard/ Proprietary: JPEG: ISO 10918-1/2; JFIF: de Facto Standard. 
Web Support: Native since MicrosoftInternet Explorer 2, Netscape Navigator 2.
JP2-JPX/ JPEG 2000 
Extensions: .jp2, .jpx, .j2k, .j2c 
Bit-depths: supports up to 214 channels, each with 1-38 bits; gray or color. 
Compression: Uncompressed 
◦Lossless/Lossy: Wavelet. 
Standard/ Proprietary: JPEG: ISO/IEC 15444 parts 1-6, 8-11. 
Web Support: Plug-in.
Portable Document Format (PDF) 
Extension: .pdf 
Bit-depths: 4-bit grayscale; 8-bit color; up to 64-bit color support. 
Compression: Uncompressed 
◦Lossless: ITU-T.6, LZW, JBIG 
◦Lossy: JPEG 
Standard/ Proprietary: De facto standard. 
Web Support: Plug-in or external application. 
Contains OCR text layer.
DjVu, pronounced “day·zha·voo” 
Extension: .djvu 
Bit-depths: 1-bit bitonal, 4-to 8-bit grayscale; 24-bit color support. 
Compression: Lossless: JB2, IW44; Lossy. 
Standard/ Proprietary: Emerging standard. 
Web Support: Plug-in or external application. 
Supports multiple images/file (multi-page). 
Contains OCR text layer.
DjVu 
High quality image compression technique: 
◦Scanned bitonal: 300dpi: 5-40K per page (3-10 times better than TIFF/G4). 
◦5-10 times better than thanJPEG or PDF
Image Masters 
◦Preservation / Archive Copy 
◦Uncompressed 
◦Highest possible quality recommended 
Derivatives 
◦Display / Viewing / Reading 
◦Printing 
◦Thumbnails
Image Masters 
◦TIFF 
◦JPEG (if using digital cameras) 
Derivatives / Deliverables 
◦Text/ Documents: PDF, DjVu 
◦Photographs: PNG, DjVu
Black and White 
◦File Format: TIFF 
◦Compression: Uncompressed or Lossless compressed using CCITT Group 4 (ITU-T6) 
◦Bit Depth: 600dpi, bitonal 
Grayscale 
◦File Format: TIFF 
◦Compression: Uncompressed or Lossless compressed using LZW or JPEG2000 
◦Bit Depth: 300dpi, 8-bit grayscale
Color 
◦File Format: TIFF 
◦Compression: Uncompressed or Lossless Compressed using LZW or JPEG2000 
◦Bit Depth: 300dpi, 24-bit color
Thumbnail 
◦File Format: JPEG 
◦Compression: Lossy 
◦Resolution: 72-100 dpi 
View / Service copy 
◦File Format: JPEG / PDF / DjVu 
◦Compression: Lossy 
◦Resolution: 72-100 dpi 
Print Copy (PDF/DjVu) 
◦File Format: PDF / DjVu 
◦Compression: Lossy 
◦Resolution: 100-150 DPI
Flatbed Scanner 
◦Best known and largest selling scanner
Sheet Feed Scanner 
◦Use the same basic technology as flatbeds, but maximize throughput, usually at the expense of quality. 
◦Designed for high-volume scanning
Overhead Scanner 
◦High speed book scanner. 
◦Sometimes referred to as “Planetary scanner” 
◦Bound volumes can be placed face up for scanning
V-Shaped Book Scanner 
◦Uses Digital SLR Cameras and a unique v-shaped, auto-adjusting book cradle and platen to capture sharp images at up to 700 pages an hour. 
◦Natively captures flat images. No need for page curvature correction.
Image Capture and Processing 
◦IrfanView(Freeware) 
Image capture, conversion, processing 
◦Adobe Acrobat (Proprietary) 
PDF creation, conversion, processing 
OCR 
Watermarks 
◦Document Express Editor (Proprietary) 
DjVucreation, conversion, processing 
OCR
Image Capture 
Image Processing 
Quality Control 
Delivery 
Storage and Backup
Document(s) or other materials are captured in digital form using a scanner or digital camera. 
Guidelines and Procedures: 
◦Pre-scanning 
Preparing item level inventory list 
◦Copyright Statement 
Should accompany each digital file. 
If accessed from the web, copyright statement can be displayed on the website (if the same rights apply to all items on the site).
Image editing (if necessary) 
◦Compression of files, sharpening of images, deskewing, image rotation, cropping, deleting and reordering pages. 
Optical Character Recognition 
Creating Derivatives 
Adding Watermarks 
Adding Security (e.g., restrictions on copying, printing, or extraction, and password protection) 
Creation of metadata describing the scanned materials.
What to look for when checking digital images for quality: 
◦Missing pages. 
◦Incorrect order of pages. 
◦Pages of different sizes. 
◦Readability of text. 
◦Black or white areas on some parts of the page that is covering the content. 
◦Image not the correct size 
◦Image in wrong resolution 
◦Image in wrong file format
What to look for when checking digital images for quality: 
◦Image in wrong mode or bit-depth 
◦Overall light problems (e.g., too dark) 
◦Loss of detail in highlights or shadows 
◦Poor contrasts 
◦Uneven tone or flares 
◦Missing scan lines or dropped-out pixels 
◦Lack of sharpness 
◦Excessive sharpening 
◦Image in wrong orientation
What to look for when checking digital images for quality: 
◦Image not centeredor skewed 
◦Incomplete or cropped images 
◦Excessive noise (see dark areas) 
◦Misaligned colorchannels 
◦Image processing and scanner artifacts(e.g., extraneous lines, noise, banding)
The process of getting the scanned images to the user through computer networks/Web, monitors, and printers. 
Delivery Methods 
◦Removable Storage Devices 
◦Optical Media (CDs, DVDs) 
◦Static Web Pages 
◦Digital Repositories
Recommended Digital Repository software: 
◦Eprints 
◦Dspace 
◦Greenstone
Strategies for storage and backup may include: 
◦Dedicated server or shared storage solution. 
Database Systems 
File-based Systems (FTP, WebDav, Shared Folders) 
◦Writing the digitized records to magnetic tape. 
◦Writing the digitized records to optical media (e.g., CD, DVD).

More Related Content

What's hot

What's hot (20)

Web scale discovery service
Web scale discovery serviceWeb scale discovery service
Web scale discovery service
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Computers in Libraries
Computers in LibrariesComputers in Libraries
Computers in Libraries
 
key word indexing and their types with example
key word indexing and their types with example key word indexing and their types with example
key word indexing and their types with example
 
Folksonomy
FolksonomyFolksonomy
Folksonomy
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
National information policy
National information policyNational information policy
National information policy
 
BIG 6 Information Literacy Model
BIG 6 Information Literacy ModelBIG 6 Information Literacy Model
BIG 6 Information Literacy Model
 
DESIDOC
DESIDOC DESIDOC
DESIDOC
 
Digital library software
Digital library softwareDigital library software
Digital library software
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Cataloging motionpictures and video recordings
Cataloging motionpictures and video recordingsCataloging motionpictures and video recordings
Cataloging motionpictures and video recordings
 
LibQUAL+®
LibQUAL+®LibQUAL+®
LibQUAL+®
 
Collection Evaluation and Weeding
Collection Evaluation and WeedingCollection Evaluation and Weeding
Collection Evaluation and Weeding
 
ORGANIZATION OF INFORMATION RESOURCES
ORGANIZATION OF INFORMATION RESOURCES ORGANIZATION OF INFORMATION RESOURCES
ORGANIZATION OF INFORMATION RESOURCES
 
Soul
Soul Soul
Soul
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
Digital Content Creation
Digital Content CreationDigital Content Creation
Digital Content Creation
 
How ict used in libraries
How ict used in librariesHow ict used in libraries
How ict used in libraries
 
User studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerationsUser studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerations
 

Similar to Standards and procedure in digitization and digital preservation

Technical glossary
Technical glossaryTechnical glossary
Technical glossary
halo4robo
 
Technical glossary
Technical glossaryTechnical glossary
Technical glossary
halo4robo
 
Unit 78: Task 3 Technical file
Unit 78: Task 3 Technical fileUnit 78: Task 3 Technical file
Unit 78: Task 3 Technical file
ConnahTilley
 
Digital graphics for computer games
Digital graphics for computer gamesDigital graphics for computer games
Digital graphics for computer games
Jason
 
Techincal glossery
Techincal glosseryTechincal glossery
Techincal glossery
xChiip
 

Similar to Standards and procedure in digitization and digital preservation (20)

IMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVECIMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
 
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your CollectionTim Keefe - DRI Training Series Day UCC: Digitising Your Collection
Tim Keefe - DRI Training Series Day UCC: Digitising Your Collection
 
Tim Keefe - DRI Training Series: 2. Digitising Your Collection
Tim Keefe - DRI Training Series: 2. Digitising Your CollectionTim Keefe - DRI Training Series: 2. Digitising Your Collection
Tim Keefe - DRI Training Series: 2. Digitising Your Collection
 
DIP.pptx
DIP.pptxDIP.pptx
DIP.pptx
 
Resolution
ResolutionResolution
Resolution
 
Technical glossary
Technical glossaryTechnical glossary
Technical glossary
 
Biomedical Engineering (Medical Equipment's) - Mathankumar.S - VMKVC, SALEM,...
Biomedical Engineering (Medical Equipment's)  - Mathankumar.S - VMKVC, SALEM,...Biomedical Engineering (Medical Equipment's)  - Mathankumar.S - VMKVC, SALEM,...
Biomedical Engineering (Medical Equipment's) - Mathankumar.S - VMKVC, SALEM,...
 
Technical glossary
Technical glossaryTechnical glossary
Technical glossary
 
Unit 78: Task 3 Technical file
Unit 78: Task 3 Technical fileUnit 78: Task 3 Technical file
Unit 78: Task 3 Technical file
 
Digital graphics for computer games
Digital graphics for computer gamesDigital graphics for computer games
Digital graphics for computer games
 
Data Storage By ZAK
Data Storage By ZAKData Storage By ZAK
Data Storage By ZAK
 
Data compression
Data compressionData compression
Data compression
 
Techincal glossery
Techincal glosseryTechincal glossery
Techincal glossery
 
Implementation of Brain Tumor Extraction Application from MRI Image
Implementation of Brain Tumor Extraction Application from MRI ImageImplementation of Brain Tumor Extraction Application from MRI Image
Implementation of Brain Tumor Extraction Application from MRI Image
 
Digital photo preservation in brief 20150617
Digital photo preservation in brief 20150617 Digital photo preservation in brief 20150617
Digital photo preservation in brief 20150617
 
Pbl1
Pbl1Pbl1
Pbl1
 
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakir
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakirMultimedia and-system-design-sound-images by zubair yaseen& yameen shakir
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakir
 
Chapter 3 - Multimedia System Design
Chapter 3 - Multimedia System DesignChapter 3 - Multimedia System Design
Chapter 3 - Multimedia System Design
 
Enhanced Image Compression Using Wavelets
Enhanced Image Compression Using WaveletsEnhanced Image Compression Using Wavelets
Enhanced Image Compression Using Wavelets
 
An Infallible Method to Transfer Confidential Data using Delta Steganography
An Infallible Method to Transfer Confidential Data using Delta SteganographyAn Infallible Method to Transfer Confidential Data using Delta Steganography
An Infallible Method to Transfer Confidential Data using Delta Steganography
 

More from Candy Husmillo

Thelma kim developing marketing plan
Thelma kim   developing marketing planThelma kim   developing marketing plan
Thelma kim developing marketing plan
Candy Husmillo
 
Social networking literacy skills
Social networking literacy skillsSocial networking literacy skills
Social networking literacy skills
Candy Husmillo
 
Marc21 paarl fresnido [compatibility mode]
Marc21 paarl fresnido [compatibility mode]Marc21 paarl fresnido [compatibility mode]
Marc21 paarl fresnido [compatibility mode]
Candy Husmillo
 
Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...
Candy Husmillo
 
Digitization projects for libraries c samaniego
Digitization projects for libraries c samaniegoDigitization projects for libraries c samaniego
Digitization projects for libraries c samaniego
Candy Husmillo
 
Delivering service quality and satisfying library customers through (final) ...
Delivering service quality and satisfying library customers through (final)  ...Delivering service quality and satisfying library customers through (final)  ...
Delivering service quality and satisfying library customers through (final) ...
Candy Husmillo
 
Creative archiving cc samaniego
Creative archiving cc samaniegoCreative archiving cc samaniego
Creative archiving cc samaniego
Candy Husmillo
 
Copyright issues in a digital library environment
Copyright issues in a digital library environmentCopyright issues in a digital library environment
Copyright issues in a digital library environment
Candy Husmillo
 
Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...
Candy Husmillo
 
Digitization projects for libraries c samaniego
Digitization projects for libraries c samaniegoDigitization projects for libraries c samaniego
Digitization projects for libraries c samaniego
Candy Husmillo
 
Delivering service quality and satisfying library customers through (final) ...
Delivering service quality and satisfying library customers through (final)  ...Delivering service quality and satisfying library customers through (final)  ...
Delivering service quality and satisfying library customers through (final) ...
Candy Husmillo
 
Creative archiving cc samaniego
Creative archiving cc samaniegoCreative archiving cc samaniego
Creative archiving cc samaniego
Candy Husmillo
 
Copyright issues in a digital library environment
Copyright issues in a digital library environmentCopyright issues in a digital library environment
Copyright issues in a digital library environment
Candy Husmillo
 
A manual for a small archives
A manual for a small archivesA manual for a small archives
A manual for a small archives
Candy Husmillo
 

More from Candy Husmillo (16)

Thelma kim developing marketing plan
Thelma kim   developing marketing planThelma kim   developing marketing plan
Thelma kim developing marketing plan
 
Social networking literacy skills
Social networking literacy skillsSocial networking literacy skills
Social networking literacy skills
 
Ms. marian s. ramos
Ms. marian s. ramosMs. marian s. ramos
Ms. marian s. ramos
 
Marc21 paarl fresnido [compatibility mode]
Marc21 paarl fresnido [compatibility mode]Marc21 paarl fresnido [compatibility mode]
Marc21 paarl fresnido [compatibility mode]
 
Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...
 
Digitization projects for libraries c samaniego
Digitization projects for libraries c samaniegoDigitization projects for libraries c samaniego
Digitization projects for libraries c samaniego
 
Delivering service quality and satisfying library customers through (final) ...
Delivering service quality and satisfying library customers through (final)  ...Delivering service quality and satisfying library customers through (final)  ...
Delivering service quality and satisfying library customers through (final) ...
 
David ppt
David pptDavid ppt
David ppt
 
Creative archiving cc samaniego
Creative archiving cc samaniegoCreative archiving cc samaniego
Creative archiving cc samaniego
 
Copyright issues in a digital library environment
Copyright issues in a digital library environmentCopyright issues in a digital library environment
Copyright issues in a digital library environment
 
Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...Libraries as problem based learning environments across reader services [comp...
Libraries as problem based learning environments across reader services [comp...
 
Digitization projects for libraries c samaniego
Digitization projects for libraries c samaniegoDigitization projects for libraries c samaniego
Digitization projects for libraries c samaniego
 
Delivering service quality and satisfying library customers through (final) ...
Delivering service quality and satisfying library customers through (final)  ...Delivering service quality and satisfying library customers through (final)  ...
Delivering service quality and satisfying library customers through (final) ...
 
Creative archiving cc samaniego
Creative archiving cc samaniegoCreative archiving cc samaniego
Creative archiving cc samaniego
 
Copyright issues in a digital library environment
Copyright issues in a digital library environmentCopyright issues in a digital library environment
Copyright issues in a digital library environment
 
A manual for a small archives
A manual for a small archivesA manual for a small archives
A manual for a small archives
 

Standards and procedure in digitization and digital preservation

  • 2. Are there standards for digitization or digital archiving? Yes, but limited to certain aspects only.
  • 3. ISO/TR 13028:2010 -Information and documentation -Implementation guidelines for digitization of records. Not applicable to: technical specifications for the digital capture of records; technical specifications for the long-term preservation of digital records; or digitization of existing archival holdings for preservation purposes, etc.
  • 4. ISO/TR 19005-1:2005; ISO/TR 19005- 2:2011; ISO/TR 19005-3: 2012, (underdevelopment) -Document management -Electronic document file format for long-term preservation. Specifies how to use the Portable Document Format (PDF) for long-term preservation of electronic documents. Standard is known as PDF/A.
  • 5. Unlike preservation microfilming and photocopying, there are no formal standards that govern the capture, processing, and storage of digital images. There are, however, a number of projects and publications that have set forth best practices for creating high-quality digital images, access systems, and storage systems.
  • 6. Also known as imaging or scanning, is the means of converting hard-copy, or non- digital, records into digital format. Hard-copy or non-digital records include audio, visual, image or text. Digitization may also be undertaken by taking digital photographs of the source records, where appropriate. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  • 7. A process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein. A more precise definition is: the storage, maintenance, and accessibility of a digital object over the long term, usually as a consequence of applying one or more digital preservation strategies. These strategies may include technology preservation, technology emulation or data migration. Source: The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials (2002).
  • 8. Born Digital -Digital materials which are created and retained in digital form. May or may not have a non-digital equivalent. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  • 9. Digital Repository / Archive -a digital repository is where digital content, assets, are stored and can be searched and retrieved for later use. A repository supports mechanisms to import, export, identify, store and retrieve digital assets. Putting digital content into a repository enables staff and institutions to then manage and preserve it, and therefore derive maximum value from it. Digital repositories may include research outputs and journal articles, theses, elearningobjects and teaching materials or research data. Source: Digital Repositories: Helping universities and colleges. JISC, August 2005.
  • 10. Master-A faithful digital reproduction of a document, optimized for longevity and for production of a range of delivery versions (derivatives). Masters are captured at the highest practicable quality or resolution and stored for long-term usage. Typically, masters are stored in an off-line mode on tape or CD and are accessed only for the production of derivative images. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  • 11. Derivative-an image created from the master image, through some kind of image editing process to create a user or working copy. The process usually involves a loss of information to reduce the size by sampling it to a lower resolution, using lossycompression techniques, or altering an image using image processing techniques. Typically, derivatives are made for purposes such as web access, including “thumbnail” images, or as “reference” or “service” images that should fit completely within an average monitor. Source: Government Recordkeeping Group, Archives New Zealand. Continuum Create and Maintain: DigitisationStandard (2005).
  • 12. Digital images-electronic snapshots taken of a scene or scanned from documents, such as photographs, manuscripts, printed texts, and artwork. The digital image is sampled and mapped as a grid of dots or picture elements (pixels). Each pixel is assigned a tonal value (black, white, shades of gray or color), which is represented in binary code (zeros and ones).
  • 13. Resolution -a measure of the ability to capture detail in the original work. The spatial frequency at which a digital image is sampled (the sampling frequency) is often a good indicator of resolution. Dots-per-inch (dpi) or pixels-per-inch (ppi) are common and synonymous terms used to express resolution for digital images.
  • 14. Pixel Dimensions -the horizontal and vertical measurements of an image expressed in pixels. May be determined by multiplying both the width and the height by the dpi. Example: an 8" x 10" document scanned at 300 dpi has thepixel dimensions of 2,400 pixels (8" x 300 dpi) by 3,000 pixels (10" x 300 dpi).
  • 15. Bit Depth-determined by the number of bits used to define each pixel. The greater the bit depth, the greater the number of tones (grayscale or color) that can be represented. Digital images may be produced in black and white (bitonal), grayscale, or color.
  • 16. Bit Depth Abitonalimageis represented by pixels consisting of 1 bit each, which can represent two tones (typically black and white), using the values 0 for black and 1 for white or vice versa. Agrayscaleimageis composed of pixels represented by multiple bits of information, typically ranging from 2 to 8 bits or more.
  • 17. Bit Depth Acolorimageis typically represented by a bit depth ranging from 8 to 24 or higher. With a 24-bit image, the bits are often divided into three groupings: 8 for red, 8 for green, and 8 for blue. Combinations of those bits are used to represent other colors. A 24-bit image offers 16.7 million (224) color values.
  • 18. File Size-calculated by multiplying the surface area of a document (height x width) to be scanned by the bit depth and the dpi2. Because image file size is represented in bytes, which are made up of 8 bits, divide this figure by 8. Formula 1 for File Size FS = (height x width x bit depth x dpi2) / 8
  • 19. File Size Example: Compute the file size of a US-Letter size page captured in 8-bit Grayscale at 100dpi. FS = (8.5 x 11 x 8 x 1002)/8 FS = 935,000 bytes.
  • 20. File Size If the pixel dimensions are given, multiply them by each other and the bit depth to determine the number of bits in an image file. Formula 2 for File Size FS=(pixel dimensions x bit depth) / 8
  • 21. File Size Example: Compute the file size of a 24-bit image captured with a digital camera with pixel dimensions of 2,048 x 3,072. FS = (2048 x 3072 x 24)/8 FS = 18,874,368 bytes.
  • 22. Compression-algorithms designed to reduce the size of the image for storage or transmission. Losslessschemes (e.g., ITU-T6) abbreviate the binary code without discarding any information, so that when the image is "decompressed" it is bit for bit identical to the original. Most often used with bitonalscanning of textual material. Lossyschemes (e.g., JPEG) utilize a means for averaging or discarding the least significant information, based on an understanding of visual perception.Typically used with tonal images.
  • 23. File Formats-consist of both the bits that comprise the image and header information on how to read and interpret the file. File formats vary in terms of resolution, bit- depth, color capabilities, and support for compression and metadata.
  • 24. Optical Character Recognition(OCR) -a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data. Source: http://finereader.abbyy.com/about_ocr/whatis_ocr/
  • 25. Quality (usability, functionality) Persistence (long-term access) Interoperability (e.g., across platforms and software environments) Storage Space (file size) Storage Hardware Storage Media (e.g., DVDs, CDs)
  • 26. Master copies should be created to the highest technical standards achievable. Image formats should be open-source (non proprietary), have published technical specifications available in the public domain. Image formats should be widely supported by many software applications and operating systems.
  • 27. Digitize an original or first generation (i.e., print rather than microfilm) of the source material to achieve the best quality image possible. Create backup copies of all files on servers and storage media (e.g., DVDs) and have an off-site backup strategy. Create meaningful metadata for image files or collections.
  • 28. Prior to digitization, consideration of third party copyright or other constraints inherent in the record should be resolved. OCR should be performed on all digital reproductions where the content is primarily textual and computer processed. Collections that are photographic in nature and those not computer processed need not require OCR. Plan for future technological developments and migration.
  • 29. Tagged Image File Format(TIFF) Extensions: .tif, .tiff Bit-depths: 1-bit bitonal; 4-or 8-bit. grayscale or palette color; up to 64-bit color. Compression: Uncompressed ◦Lossless: ITU-T.6, LZW, etc. ◦Lossy: JPEG Standard/ Proprietary: De facto standard. Web Support: plug-in or external application. Supports multiple images/file (multi-page).
  • 30. Joint Photographic Expert Group(JPEG) / JPEG File Interchange Format (JFIF) Extensions: .jpg, .jpeg, .jif, .jfif Bit-depths: 8-bit grayscale; 24-bit color. Compression: Lossless; Lossy: JPEG. Standard/ Proprietary: JPEG: ISO 10918-1/2; JFIF: de Facto Standard. Web Support: Native since MicrosoftInternet Explorer 2, Netscape Navigator 2.
  • 31. JP2-JPX/ JPEG 2000 Extensions: .jp2, .jpx, .j2k, .j2c Bit-depths: supports up to 214 channels, each with 1-38 bits; gray or color. Compression: Uncompressed ◦Lossless/Lossy: Wavelet. Standard/ Proprietary: JPEG: ISO/IEC 15444 parts 1-6, 8-11. Web Support: Plug-in.
  • 32. Portable Document Format (PDF) Extension: .pdf Bit-depths: 4-bit grayscale; 8-bit color; up to 64-bit color support. Compression: Uncompressed ◦Lossless: ITU-T.6, LZW, JBIG ◦Lossy: JPEG Standard/ Proprietary: De facto standard. Web Support: Plug-in or external application. Contains OCR text layer.
  • 33. DjVu, pronounced “day·zha·voo” Extension: .djvu Bit-depths: 1-bit bitonal, 4-to 8-bit grayscale; 24-bit color support. Compression: Lossless: JB2, IW44; Lossy. Standard/ Proprietary: Emerging standard. Web Support: Plug-in or external application. Supports multiple images/file (multi-page). Contains OCR text layer.
  • 34. DjVu High quality image compression technique: ◦Scanned bitonal: 300dpi: 5-40K per page (3-10 times better than TIFF/G4). ◦5-10 times better than thanJPEG or PDF
  • 35. Image Masters ◦Preservation / Archive Copy ◦Uncompressed ◦Highest possible quality recommended Derivatives ◦Display / Viewing / Reading ◦Printing ◦Thumbnails
  • 36. Image Masters ◦TIFF ◦JPEG (if using digital cameras) Derivatives / Deliverables ◦Text/ Documents: PDF, DjVu ◦Photographs: PNG, DjVu
  • 37. Black and White ◦File Format: TIFF ◦Compression: Uncompressed or Lossless compressed using CCITT Group 4 (ITU-T6) ◦Bit Depth: 600dpi, bitonal Grayscale ◦File Format: TIFF ◦Compression: Uncompressed or Lossless compressed using LZW or JPEG2000 ◦Bit Depth: 300dpi, 8-bit grayscale
  • 38. Color ◦File Format: TIFF ◦Compression: Uncompressed or Lossless Compressed using LZW or JPEG2000 ◦Bit Depth: 300dpi, 24-bit color
  • 39. Thumbnail ◦File Format: JPEG ◦Compression: Lossy ◦Resolution: 72-100 dpi View / Service copy ◦File Format: JPEG / PDF / DjVu ◦Compression: Lossy ◦Resolution: 72-100 dpi Print Copy (PDF/DjVu) ◦File Format: PDF / DjVu ◦Compression: Lossy ◦Resolution: 100-150 DPI
  • 40. Flatbed Scanner ◦Best known and largest selling scanner
  • 41. Sheet Feed Scanner ◦Use the same basic technology as flatbeds, but maximize throughput, usually at the expense of quality. ◦Designed for high-volume scanning
  • 42. Overhead Scanner ◦High speed book scanner. ◦Sometimes referred to as “Planetary scanner” ◦Bound volumes can be placed face up for scanning
  • 43. V-Shaped Book Scanner ◦Uses Digital SLR Cameras and a unique v-shaped, auto-adjusting book cradle and platen to capture sharp images at up to 700 pages an hour. ◦Natively captures flat images. No need for page curvature correction.
  • 44. Image Capture and Processing ◦IrfanView(Freeware) Image capture, conversion, processing ◦Adobe Acrobat (Proprietary) PDF creation, conversion, processing OCR Watermarks ◦Document Express Editor (Proprietary) DjVucreation, conversion, processing OCR
  • 45. Image Capture Image Processing Quality Control Delivery Storage and Backup
  • 46. Document(s) or other materials are captured in digital form using a scanner or digital camera. Guidelines and Procedures: ◦Pre-scanning Preparing item level inventory list ◦Copyright Statement Should accompany each digital file. If accessed from the web, copyright statement can be displayed on the website (if the same rights apply to all items on the site).
  • 47. Image editing (if necessary) ◦Compression of files, sharpening of images, deskewing, image rotation, cropping, deleting and reordering pages. Optical Character Recognition Creating Derivatives Adding Watermarks Adding Security (e.g., restrictions on copying, printing, or extraction, and password protection) Creation of metadata describing the scanned materials.
  • 48. What to look for when checking digital images for quality: ◦Missing pages. ◦Incorrect order of pages. ◦Pages of different sizes. ◦Readability of text. ◦Black or white areas on some parts of the page that is covering the content. ◦Image not the correct size ◦Image in wrong resolution ◦Image in wrong file format
  • 49. What to look for when checking digital images for quality: ◦Image in wrong mode or bit-depth ◦Overall light problems (e.g., too dark) ◦Loss of detail in highlights or shadows ◦Poor contrasts ◦Uneven tone or flares ◦Missing scan lines or dropped-out pixels ◦Lack of sharpness ◦Excessive sharpening ◦Image in wrong orientation
  • 50. What to look for when checking digital images for quality: ◦Image not centeredor skewed ◦Incomplete or cropped images ◦Excessive noise (see dark areas) ◦Misaligned colorchannels ◦Image processing and scanner artifacts(e.g., extraneous lines, noise, banding)
  • 51. The process of getting the scanned images to the user through computer networks/Web, monitors, and printers. Delivery Methods ◦Removable Storage Devices ◦Optical Media (CDs, DVDs) ◦Static Web Pages ◦Digital Repositories
  • 52. Recommended Digital Repository software: ◦Eprints ◦Dspace ◦Greenstone
  • 53. Strategies for storage and backup may include: ◦Dedicated server or shared storage solution. Database Systems File-based Systems (FTP, WebDav, Shared Folders) ◦Writing the digitized records to magnetic tape. ◦Writing the digitized records to optical media (e.g., CD, DVD).