Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tim Keefe - DRI Training Series: 2. Digitising Your Collection

844 views

Published on

Presentation given by Tim Keefe, Head of Digital Resources and Imaging Services (DRIS) at Trinity College Dublin, on March 15th, 2016 in the Royal Irish Academy, Dublin, as part of the DRI Training Series 'Preparing Your Collection for DRI'. This seminar introduces attendees to the basics of digitising heritage material, efficient workflows and some information on equipment requirements, as well as file format compatibility with DRI.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Tim Keefe - DRI Training Series: 2. Digitising Your Collection

  1. 1. DRI Training: Preparing Your Collection for DRI 2. Digitising Your Collection Digital Imaging – Introduction, components, process. Tim Keefe, Head of Digital Resources and Imaging Services, Trinity College Dublin keefet@tcd.ie
  2. 2. Questions we all need to ask ?? When beginning a digitization project it is easy to ignore the basic questions, those questions that we all assume we know the answers to … however these questions are often the most important, and need to addressed formally.
  3. 3. Questions to ask  What is the purpose of this project?  What is the scope of the digitization activity?  What is the intended lifetime of the digital files?  Who is the intended audience?
  4. 4. Purpose  What is the purpose of this project?  Why are we digitizing the material  Need/Trend  Access  Research  Education  Who are the champions for this project  Local  External  Who or what are the barriers to the implementation of this project  Human  Resource  Procedural/Political
  5. 5. Scope  What is the scope of the digitization activity  What is to be digitized  What is not to be digitized  Why?  Who is likely to demand operation outside of these criteria
  6. 6. Intended Audience  Who is the intended audience for the digital resources  What are their needs  How will they access the material  Who else will be interested  Are you prepared for a new audience (known or unknown) to self select to become the primary audience  Do you wish to prevent any audience from having access to the resources
  7. 7. Image Lifetime  What is the intended lifetime for the digital records  This question is critical to the appropriate development of the digitization activity  Significant resource implications  Significant planning implications  Significant digitization process implications
  8. 8. So Why Digitize?  Access  Electronic mediums provide the most dynamic assess  Digital data structures offer the opportunity for truly dynamic new research and educational models offering unique new capabilities to existing methodologies  Preservation  Digital files designed to proper specifications can be true surrogates for delicate source materials for all but a hand full of advanced research needs  Manipulation  Non Linear  Digital resources allow for easy modification to image characteristics  Digital files easily cross medium boundaries providing opportunities for new use models
  9. 9. Problems with digitization  Pace of technological change is constantly increasing the digital attributes bar  Not human readable  Lack of best practices / attribute recommendations  Long term digital preservation is a newly emerging field, solutions just beginning to emerge  Much more complex than having IS Services make a backup copy  Extremely costly activity  TCO not well understood, few models
  10. 10. Capture for What?  In TCD we designate the capture activity on the object intent  Capturing for Content  Speed and cost most important  Quality less important  Capturing the Object  Quality most important  Meeting the needs of the researcher… researching anything
  11. 11. Components  The primary components of an average imaging system:  Digital capture device  Light source if not included in the capture system  Optics if not included in the capture system  Color Calibration System  Image Capture/Image Processing Computer System(s)  Software packages  Data Storage Systems
  12. 12. Digital Capture Systems  Scanners
  13. 13. Digital Capture Systems  Flatbed  Reflective /transmissive capabilities  Infra red dust and scratch removal systems (ICE)  Linear/Tri linear or CCD systems  Low productivity  Inclusive of software
  14. 14. Digital Capture Systems  Flatbed (limitations)  Works best with two-dimensional materials.  Not recommended for use with fragile or tightly bound material.  Limited scan area.  Very slow
  15. 15. Digital Capture Systems  35mm Photographic
  16. 16. Digital Capture Systems  Digital Photographic systems  35mm format  CCD / CMOS digital capture sensors  Full Frame or Reduced frame sensors  1.5 to 1.33 avg. magnification values  High productivity  Limited resolution  Limited bit depth (8-14 bit)  Cost effective  Good starting solution
  17. 17. Digital Capture Systems  Medium format (MF digital back)
  18. 18. Digital Capture Systems  Medium format (MF digital back)  CCD sensors  6 x 4.5cm to 6 x 7cm sensor size  With and with/out micro-lenses  High bit depth (16bit)  High productivity  High Cost  Requires high level of studio photographic experience  Additional software needs.  Associated Equipment also expensive
  19. 19. Digital Capture Systems  Dedicated Book Scanning Systems  One size fits all… and all its limitations  Limited source material input  Material handling and support  Possible automation  page turning ,  image management  Linear or CCD based  Digital Camera based  High to very high productivity
  20. 20. Digital Capture Systems  Dedicated Book Scanning Systems  Linear CCD based, generally with included software. (flatbed in different form factor)
  21. 21. Digital Capture Systems  Dedicated Book Scanning Systems  Digital Camera based  Robotic Scanners
  22. 22. Robots…Really?
  23. 23. Computer Technology  What to buy  Image processing is one of the more intensive computing tasks  Recommendation is to buy the fastest most modern computer that you can afford right now  Memory requirements are often more critical than processor speed (multi core technology is not being fully advantaged by software yet)  Graphics Card often more important than processor  Have a minimum RAM of 4x your largest file size… 8x recommended  Will cost 2-5x more than normal office computer
  24. 24. Computer Technology  Consider the software needs of the digital capture system you have chosen.  Is software for generating the files required by your Project Scenario or device type?  Some MF camera systems require unique software  Will it be necessary to purchase additional image editing software packages (e.g. Adobe Creative Suite/ Photoshop) or file management software (Lightroom, Bridge, etc.)  Many of these software packages are now subscription based
  25. 25. Storage Technology  RAID (Redundant array of inexpensive disks)  Level 0 (striped) – Speed and performance increases  Data is broken up and is written across several disks, taking advantage of multiple writing heads to improve data throughput (often used for video processing)  Level 1 (mirrored) – Security through redundancy  Data is identically written to more than one disk, allowing for backup protection should any single disk fail  The overall all data storage volume of the system is halved when a level one raid is activated  Local Hard drive (under the desk solution)  Low cost, lowest preservation (use only when required)
  26. 26. Digital Vocabulary File Structure  File types  Compression  Spatial resolution  Bit Depth  Dynamic range  Color mode
  27. 27. File Types  Tiff (Tagged Image File Format)  Large file size  Standard format  Lossless compression LZW (and lossy options)  Jpeg (Joint Photographic Experts Group)  Smaller file sizes  Lossy compression in most cases but newest versions support lossless (Rarely supported)  Standard format  Jpeg 2000 (Lossless and or Lossy)  Multiple file sizes embedded within single digital record  Emerging format (adoption very slow, caution)
  28. 28. File Types cont.  PDF (Portable Document Format - Adobe Acrobat)  Advanced Cross Platform Compatibility  Ability to support complex document generation  Text, images, notes, embedded graphics, etc,  Support for advanced printing  Support for sharing and dissemination  Standard file type  Caution as there are a wide variety of versions and variants  Digital preservation ISO standard acrobat type A files  Adoption rate very low  Some believe that this standard had political / corporate influence driving recommendation  GIF  Dying file format, not recommended
  29. 29. File Compression Two basic types of compression Lossy and Lossless  Lossy  Image structure is changed (damaged) by the compression activity, but not in a perceptual way  Jpeg is the most common format using lossy compression  Every file save increases the damage  file conversion/save into a lossy format should always be the final step in the digitization and image processing process  Large reduction in file size
  30. 30. File Saving  Save Order  When working with files that use or will use a lossy compression (Jpeg) it is important that the very last step in the process is the file save  Each save recompresses the data and causes further image degradation  It is best practice to work in a lossless format such as Tiff, and save out the final Jpeg as a last step. This workflow will minimize the impact of the compression artifacts
  31. 31. Compression cont.  Lossless  Image file structure is not changed in any way by the compression activity  The Tiff file format with LZW compression is the most widely used lossless compression format  Note, the tiff file format can be also generated with no compression or lossy compression
  32. 32. Compression examples
  33. 33. Resolution  This metric is generally stated as pixels per inch (ppi), or the total number of individual picture elements that will fit in a 1 x 1 inch sample  This is sometimes confused with dots per inch (dpi) which is a printing specific metric  Spatial resolution requires dimensional measurements and ppi sample rate  Screen resolution is 72 ppi (newest technology screens now exceeding 125ppi)  High resolution commercial printing requires 300-650 ppi image files  General internet jpg files 72-150ppi
  34. 34. Bit Depth  Bit depth is the number of samples provided within each image channel (RGB, CMYK)  This term is often confused with dynamic range  They are not the same however there is an interaction between them  The number of discrete steps between black and white
  35. 35. Bit Depth  Bit depth is stated in the number of bits of data per channel  Bit depth is 2 (binary measure) raised to the power of the bit depth number so 4 bit color will have 16 steps between the black and white values ** note that bit depth is stated in either the number of bits per channel as in 8 bit color or by the sum of all the channels combined (R+G+B) = 24bit color… this can be confusing
  36. 36. Bit Depth  8 bits per channel (or 24 bit color)  256 value steps in each channel  16.8 million possible colors  16 bit per channel (or 48 bit color)  65536 value steps in each channel  281.5 trillion possible colors  Many manufacturers talk about interim bit depths (12- 14), but the final output is often reduced to 8 bits per channel  you cannot add missing data by moving to a higher bit depth
  37. 37. Dynamic range  Dynamic range is the ability of a sensor to simultaneously capture dark detail, and light detail  This is an inherent weakness of digital capture  Decisions are made to set device to support either a greater tonal range of dark densities(more common) or light  Commonly confused with bit depth  They are separate characteristics despite all the contrary information out there (much of it from reputable sources)… I promise  Greater bit depth will not automatically provide greater Dynamic Range (however improvements in bit depth often accompany other sensor improvements that include increased DR)
  38. 38. Dynamic Range  Clipping  Clipping is a failure state of a digital image as the limited dynamic range of a device is unable to correctly capture either very light or very dark tones
  39. 39. Color Mode  RGB (Red/Green/Blue color channels)  Additive color  Most common color mode for digital images  Mimics human visual system
  40. 40. Color Mode  CMYK (Cyan/Magenta/Yellow/Black)  Subtractive color  Commercial Printing standard  Most desktop color printers support RGB color files (CMYK conversion is internally managed)  Limited color gamut
  41. 41. Color Mode  Lab color  Single luminance (grey scale channel) and 2 opposing color channels  Loosely represents the range of human vision  Good for transforms
  42. 42. Color Profile Standards  The user defined color profile assigned to the image files supports several informal standard configurations  sRGB  Profile developed more than a decade ago by HP and Microsoft. Represents the Gamut of an average CRT monitor  Very Limited color palette  New output devices currently capable of exceeding this space  Most commonly used profile (usually the default if not stated)
  43. 43. Color Profile Standards  Adobe RGB 1998  Newer profile designed to support wider palette of colors to support higher quality printing  Lower use than sRGB, but well recognized  Maintains a color appearance consistent with sRGB devices  ProPhoto RGB  A wide gamut color space designed for very high quality printing of photographic images  Color appearance is highly inconsistent when use with devices not color managed, or set to sRGB standards  Despite the benefits of this color space, its use is quite limited due to the setup and management requirements  Caution in its use, as inaccurate color characteristics can occur with improperly managed devices
  44. 44. Image Processing Post capture modifications and manipulations to the original digital image file structure
  45. 45. The Controversy  Two primary schools of thought  The digital master image files should remain untouched as they emerge from the capture device and all subsequent processing should occur only on the surrogates  Image processing will occur on the master capture file with the intent of matching the original source material as closely as possible at the time of capture
  46. 46. Color Mode  RGB  Standard image space for files  Common, not likely to change  CMYK  Avoid this space for all but specific commercial printing activities (even then try to ignore it)  Lab  Great for processing transforms that can benefit from a luminance channel  Sharpening  Noise removal  No color profile
  47. 47. File Formats  Master  This is the high quality large image generated from the capture device  Surrogates  These are secondary files generated from the master file to be used for specific purposes
  48. 48. File format Sets  Master  Tiff  This is intended to be the highest quality image  Represents the asset derived from the € spent  Lossless compression recommended  Compressed Jpg’s  File size reduced for easier management, and dissemination, and to manage costs  Lossy compression is acceptable within the use cases  Often several sizes (Large, small, thumbnail)  Used for public display
  49. 49. Image Manipulations  Tone Scale  To adjust tone scale you need to push or pull predetermined black and white values to defined positions on the histogram  This requires the use of a calibrated reference target placed within the image
  50. 50. Image Manipulations  Sharpening  Sharpening works by increasing the contrast between edges in an image. This change in contrast fools the human visual system into believing that the image is sharper
  51. 51. Image Manipulation  Sharpening
  52. 52. Cropping  Cropping  Cropping is the permanent removal of unwanted parts of the image  Formally determine where the boarders of your images should be  For research purposes the entire page should be represented  For access and content related scanning cropping to the textural areas of the page may be desired  Failure modes  What determines a crop or image capture that is unacceptable requiring reprocessing or a new capture  Formalize this
  53. 53. Skew/Rotation  Skew/Rotation  When the source material is not perpendicular to the edges of the digital image  Failure mode  Determine what percent is unacceptable  Formalize this criteria
  54. 54. White Balance  White balance is a color balancing function used to address the color differences imparted by varying light sources.  The human visual system does this automatically in the brain, removing the real color cast imparted by source illuminant and giving us the perception that most lights are white.  Think of the differences evident when you have a desktop incandescent bulb in a room lit by fluorescent  This is also important in the environment where your image processing occurs
  55. 55. White Balance  Most white balance is preset within the capture system, however fine tuning or custom profiles can be applied in the processing stage  Neutral 18% grey references are used to generate a custom balance  When adjusting tone scale in Photoshop, neutral grey adjustment can be used to correct White Balance inconsistencies
  56. 56. Quality Control/Assurance Imaging and image processing are a highly repetitive, human dependent set of processes and are therefore highly susceptible to regular error
  57. 57. Control vs. Assurance  Control is in process activities to ensure quality in the creation of the products ( digital images)  Assurance is focused on an evaluation of the processes used and generally takes place outside of the creation process
  58. 58. Quality Control  Processes built into the imaging work flow to ensure that the creation of digital images is  Consistent  Accurate  Repeatable  Often automated these processes are inherently part of the imaging workflow
  59. 59. Quality Assurance  The Quality Assurance Audit  Formal.. Informal just does not work  Existing toolsets developed for a variety of manufacturing based industries are highly effective  TQM  Six Sigma  Etc.  Takes place fully outside of the imaging processes
  60. 60. Quality Assurance Testing  What to test for  Imaging  File structure metrics  Naming, page counts  System/Network (positioning, backup etc.)  Metadata  Structure  Accuracy  Completeness
  61. 61. Color Management One of the most critical, and often ignored, components of a successful digitization project is a well planned color management strategy
  62. 62. Color Management Within any imaging and processing system you need to ensure that consistent color is displayed from device to device, and that a files color metrics are electronically recognized  Technology Required  Capture reference targets  Color profiles / icc
  63. 63. Color Reference Targets  Allows a formal measured reverence to be associated with the image (future proofing)
  64. 64. Color Management Technology  Color meters (Basic screen calibration)  Absorptive measurements  Less dynamic than Spectrophotometers  Spectrophotometers (Advanced CM)  Can measure the intensity of light as a function of the wavelength of the light  Light absorption  Diffuse  Specular
  65. 65. CM Standards  ICC (international color consortium)  Works through a standardized Color Matching Module (CMM) connection space  Not an ideal solution, but one that has been very well adopted by most imaging related hardware and software vendors  ColorSync (Apple Computer)  Apple solution to color management  Part of the Macintosh system software  Generally plays well with others, occasionally some fiddling is necessary (ICC integrated)  Hands off approach
  66. 66. Further Reading and Resources  DRI and Digital File Format Choices Factsheet: http://dri.ie/sites/default/files/files/dri-factsheets-file-formats.pdf  DRI Long-Term Digital Preservation Factsheet: http://tinyurl.com/hbp28xe  Online Resources for Digitisation Projects: http://dri.ie/digitisation-resources - includes resources for Project Planning, File Formats, Audio & Audiovisual, Hardware, Metadata & Vocabularies and Policy.  Trinity College Dublin Digital Collections Repository: https://www.tcd.ie/Library/dris/digital.php

×