Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

KU Leuven - Words and numbers - ICoC


Published on

Presentation of the digitisation works with historical languages performed by the KU Leuven during the Impact Centre of Competence Annual General Meeting

Published in: Technology
  • Be the first to comment

  • Be the first to like this

KU Leuven - Words and numbers - ICoC

  1. 1. Words and numbers KU Leuven University Library Central Services Digitisation
  2. 2. Intro: KU Leuven Digitisation • University Library Central Services • Digitisation projects and programmes o Research, education, heritage o Coordination, facilitation • Imaging Lab o Focus on quality o Focus on innovation
  3. 3. Intro: LIBIS • IT solutions for collection management o Archives, libraries, musea o Development and support for network larger than just KU Leuven o LIAS • Solutions for researchers o Scientific data management, collaboration, sharing o Multiple environments • Centre of expertise • Project oriented
  4. 4. Lines of Work and Issues • Output formats • Historical languages: Latin • Historical languages: Demotic and friends • Printed statistical tables • Manuscripts and handwritten materials • Workflow management
  5. 5. Output formats • SUCCEED • OCR engines generate TEI that does not use all features of the standard. • Reduces the value of OCR-generated TEI as a starting point for research. • Looking for: o A way to improve the quality of TEI generated by OCR engines • Possible input: o LIBIS expertise and knowhow
  6. 6. Historical languages: Latin • Course notes by students of the old university of Leuven • Western Europe: Latin essential for historical research • Fragmented efforts, hard to track, difficult to establish cooperation • Looking for: o Highly automated and accurate OCR = limited manual intervention o Lexica, NER • Possible input: o Text material from different periods and locations o Academic input: neo-latin, …
  7. 7. Historical languages: other • Latin is not the only important historical language • Precursors of contemporary spoken languages • No specific projects for now • Certainly important for our researchers, Hebrew for instance • Looking for : o Initiatives we might join
  8. 8. Printed statistical tables • Recensement général des industries et des métiers (31 octobre 1896) • Nineteenth-century statistical material • Very hard to use for research due to sheer size and complexity • Solution: digitisation followed by OCR • Output: spreadsheets or functional equivalents • Looking for: o Extremely accurate OCR for numeric materials o correct translation of dense table layout o Tools for preparation of the digitised images and quality control • Possible input: o Digitized source material o Expertise:Depts of Electrical Engineering, Economic History, Historical Demography
  9. 9. How to deal with complex layout, columns and ciphers?
  10. 10. Manuscripts and handwritten material • RICH + Bible of Anjou • Ready to contribute material as content holder • Working on a programme about letters
  11. 11. Workflow management • Digicorder + Teamwork • How do others deal with workflow management? • Where to position enrichment in digitisation workflow? • Ready to participate in the production of Webinars
  12. 12. Klik op het pictogram als u een afbeelding wilt toevoegen Digicorder = tool to manage naming of projects and scans Created by Diederik Lanoye using Filemaker
  13. 13. Options when creating unique names for scans and corresponding labels Starting point = object to be digitized Label = description of part of object or number of page or folio
  14. 14. Names for scans and corresponding labels
  15. 15. Information shown for each scanned image
  16. 16. Teamwork = workflow management tool Dashboard lists projects, tasks, milestones and responsibilities
  17. 17. Inside a project: tasks on a timeline
  18. 18. Milestones are defined for important moments in the workflow Often in case of transitions More information:
  19. 19. You never walk alone o Issues are not specific to KU Leuven o Sharing expertise to cover all aspects is the only way to go o Valuable expertise in specific fields • Neo and humanist Latin • Historic demography and Economic history • Imaging o On our wishlist: • Cooperation in new and on-going developments • Exchange of expertise • Above all: action
  20. 20. Cooperation • Wiki as a starting point, interesting initiative • Who wants to join forces? o Writing projects together o Searching for funding • Important: o Automated o Accurate o Scalable and Maintainable o Cost effective • • Hoping to return to Leuven with names, specific suggestions, and appointments for meetings to discuss proposals
  21. 21. Appendix: Center for Processing Speech and Images • The Center for Processing Speech and Images (PSI) is one of the units within the department of Electrical Engineering (ESAT) at KU Leuven. It is specialized in computer vision and has object and object class recognition as one of its most important domains of research. Besides more general goals as scene understanding, segmentation or invariant object recognition, it has experience with character recognition in licence plates and automatic recognition of handwritten music scores for transcription to modern music. With more than 60 researchers it is one of the biggest research groups of its kind in Europe and has a lot of experience in national and international projects. 2 professors have received ERC grants of the European Commission and have won several other prestigious prizes.