This deck documents the #digitization #workflow as practiced by CinnamonTeal Publishing, a provider of digitization and archival services, based in India. Comments are welcome.
2. Aims of Digitization
1. Access: Through the use of efficient meta data, digitized works (including
books, manuscripts, letters, newspapers, and other such material) can be
easily accessed
2. Preservation: Works under threat of being destroyed can be preserved for
future reference
3. Handling: Handling of rare works is minimized
4. Circulation: Old works can be circulated among readers
5. Organization: Digitized works can be easily organized for access and
dissemination
3. Workflow:
Step 1. Choose Items to Digitize
The need to “choose items to digitize” arises from situations when all material
cannot be digitized. The following steps need to be taken at this point.
1) Careful selection on what items (books, manuscripts, letters, memos,
newspapers) should be digitized. This selection is based on project
requirements and the equipment available for digitization.
4. Workflow:
Step 1. Choose Items to Digitize
2) Factors influencing what items are chosen for digitization include
a) Relevance to research
b) age of item (and hence urgency to digitize)
c) rarity of item (in general, or within the local area),
d) condition of item, especially its fragility
3) Items should be kept aside so that other items are not “disturbed”.
4) The required permissions to digitize should be taken if the need exists
5. Workflow:
Step 1. Choose Items to Digitize
Other considerations during this stage:
- Costs of project
- Do it yourself/ outsource
- Collaborate with other institutions to increase scope /reduce costs of
project
6. Workflow:
Step 2. Prepare Items to Digitize
- This involves removing the material from the shelves and making them
available for digitization
- Documenting all such material, that is set aside for digitization
- This step also involves:
- dusting the material carefully to remove dust and specks of dirt that may interfere with the
digitized image
- Removing staples, folders, etc.
- “Combining” documents that belong together (or similar media, like photos, pages, etc.)
into units (for which a cover sheet is designed with important information about the unit)
- Documenting the current state of damage/decay of the material
7. Workflow:
Step 3. Scanning of Works (1 of 3)
- This step includes
- Choosing the right type of scanner for various formats
- Determining the parameters that will be adhered to (please see following slides)
- Choosing rules for nomenclature of files/ folders
- Capture of images
- A colour scale must be used where necessary to ensure accurate
reproduction of colours. This fits with the larger need to accurately
calibrate the image capturing device prior to commencement of this step
8. Workflow:
Step 3. Scanning of Works (2 of 3)
- Care should be taken to see that the resultant image has a resolution of
300-600 dpi. This decision is also based on the storage space available.
- The image can be captured with a higher resolution when the material is in bad shape and
detail is important e.g. old maps, and for transparencies and negatives
- The scanned image should be saved in the (uncompressed) RAW or TIFF
format, before being converted to the JPEG or other formats.
- Scanned images should be saved in folders according to the prescribed nomenclature
- The scanned image should NOT be edited in any manner whatsoever. Any
additions or modifications can be made on a copy of the images or after
conversion to other formats e.g the JPEG format
9. Workflow:
Step 3. Scanning of Works (3 of 3)
- During the process of scanning books/newspapers/papers
- The works must be placed under glass.
- The page should not be allowed to spill over the table on which it is placed. A bigger table
may be required in such a case. No attempt must be made to scale down the image
- The camera should cover the entire page (including the margins) being scanned.
Sometimes two cameras can be used to capture opposing pages
- Blank pages should also be scanned in the order they appear in the book
- Gloves should be used by the person performing the scanning
- Material must not be left unattended on the scanning bed
- Before moving to the next page, the operator should ensure that the image is clear and
“can be read”
10. Workflow:
Step 4. Quality Control
- This step involves checking for different parameters before the digitized
item is returned to its place
- Parameters that must be checked include:
- Sequence of pages
- Readability of text (lack of sharpness)
- Resolution, colour depth
- Coverage of image (whether the full image is captured)
- Reproduction of colour
- Orientation of text
11. Workflow:
Step 5. Recording of MetaData
- Before the book is put back on the shelf, metadata should be recorded
carefully (see following slide). (Another option is to record metadata prior
to capture of image in the case of each unit of material.)
- Both, the current location of the book, as well as its provenance (to the
extent possible), should be accurately recorded
- The Meta Data should record each page, as also the larger collection to
which each page belongs
- The technical details of the capturing device should also be recorded
12. MetaData
- As applied to the archival process, there are basically 4 kinds of metadata
that need to be recorded
- Bibliographic metadata, related to the document and material being digitized. Includes,
among other things, languages and scripts used, if material is still under copyright, and if
material contains sensitive data.
- Structural metadata, which displays the document structure like a table of contents
- Administrative metadata, which provides information on usage rights, current custodian,
provenance and custodial history, for example
- Technical metadata, which contains the technical parameters of a digital copy and
provides information on the file type, file size, resolution, location of copies, etc.
- In addition, other information such as that related to the equipment used,
software employed, and even the location of capture are recorded
13. Workflow:
Step 5. Recording of MetaData
- The last part of this step involves facilitating the creation of descriptive and
administrative metadata across preservation (such as Dspace™) and
management (such as ArchivesSpace™) systems, and projecting these
onto a website..
14. Workflow:
Step 6. Copying and Storing
- As mentioned earlier, the nomenclature for the naming of files and folders
must be established before the scanning process is underway
- Files are stored in folders alone. No “loose” file is stored.
- An MD5 checksum manifest is included in each folder for error detection
- The original RAW/TIFF files may be copied if the images have to be
modified. The checksum manifest should be used to ensure that the files
have been copied correctly
- Files are stored on a hard disk. The use of CDs/DVDs for storage of
scanned data is avoided
15. Workflow:
Step 7. Modification & Dissemination
- The copies of the digitized images may be modified as per requirements
- They may also be subject to OCR software
- These images, when converted to PDF/A format, can be made available
to readers
- Dissemination can be done through software systems such as dSpace™
and Islandora™. These systems enable easy storage and search-based
retrieval of scanned objects.
- Metadata can be accessed through hosted systems such as
ArchivesSpace™.
16. About Us
- Based in Goa, India
- Local partner for EAP636, a project by the British Library’s Endangered
Archives project
- Have digitized more than 400,000 pages till date
- Material digitized includes books, manuscripts, letters, newspapers,
notepad, photos, photo negatives and microfiche
- Equipment used includes high-end DSLRs, photo-scanners and book
scanners
- Services include development of repository systems such as dSpace™
and Islandora™, as also of meta data databases such as ArchivesSpace™.
- Can be contacted at contactus@cinnamonteal.in / +91 98503 98530