University Library of Granada presentation at "Succeed in Digitisation. Spreading Excellence" Conference. Validation and take-up of text digitisation tools.
2. University Library of Granada, Spain
Our University
• About 55.000 undergraduate students
• More than 6.000 postgraduate students
• Almost 6.000 lecturers and clerical staff
• 28 Faculties & Schools, 123 departments, 18 research centres
• 77 degrees, 98 official masters
Our Library
• Origin of the library: 1526 (1531: university)
• Holdings from donations until the 19th century
• Old books collection : incunabula, imprints, manuscripts
• 21 branches
• 1.100.000 monographs in paper and more than
6.000 electronic books
• 50.000 electronic journals
• 156 databases
• More than 15.000 documents in Digibug
3. AlchemyAPI
• Objective: to select keywords automatically in Digibug
• Tool: AlchemyAPI, Text API method (TextGetRankedKeywords)
• Collection and documents:
- Research (9.197), journal (3.758), learning (65), old books collection (14.099),
official papers (1.690)
- Chronology: from the 15th century to the 20th century
• Languages used: Spanish, English and Italian
• Staff: A librarian, time: a week; interface: 2 days
• Corpus size processed 192 master’s projects, 200 journal articles, 10 other types of
documents
4. Evaluation Results: AlchemyAPI
• Text size: 150 Kb Abstracts
• Spanish abstracts: poor results
• English abstracts: the results improved substantially
• USABILITY
- GET type form: no need to set up in the PC
- Free API key: limited to 1.000 daily transactions and 5 concurrent requests
- Guides and documentation: very useful
• IMPROVEMENTS
- Spanish language
- Text size
5. Scan Tailor
• Objective: to improve the visual appearance of the material, specially in tilting, page
splitting and staining removal
• Tool:
- Scan Tailor, version 3.0.9.11.1
- Input and output format: TIFF
• Selection:
- 11 documents from the old books collection
- Several aspects have been taken into account: font type, state of
conservation, layout, etc.
- Languages:Castilian, Latin, Greek and French
- Font types: roman and capital letters
- Text types: italic, roman
• The color mode is needed to preserve the original features and it’s not advisable to
use staining removal neither filling areas to preserve the fidelity. But, as we have to
evaluate Scan Tailor, we use those options to clear the deficiencies.
6. Evaluation Results: Scan Tailor
• Comparison between Scan Tailor and Photoshop. Scan Tailor is better in:
- Tilting
- Staining removal
- Page splitting: it detects the division line in the 95% of the cases, though the
page was really turned
• But Scan Tailor is limited in color images and it doesn’t fit our needs
• USABILITY:
- The filling area utility doesn’t match our objectives
- But it works quite well with the geometrical correction
- The automatic content selector is not always effective, when there are stamps,
watermarks, signatures or just plain pages, it’s recommended to use the manual
selector
• IMPROVEMENTS:
- A viewer to compare the original image and the processed one
- An option “restore values”
- An option to save at any step of the process