Royal Netherlands Academy of Arts and Sciences (KNAW) International Institute of Social History (IISG) Library Applications Workflow Vyacheslav Tykhonov mailto: email@example.com October 18, 2012
Software Tools Overview Evergreen library system (core) with external applications developed in IISG Digital Repository to store metadata and files (images, video, audio, etc) OCR service to convert images to text VisualMets Viewer to browse scans HiTIME project for Named Entity Recognition Search (VuFind) as interface to access linked metadata
Evergreen applications overview Charts Builder GeoLocator Visual Timelines Custom Reports Open Archives Initiative Protocol (OAI) for Metadata Harvesting (for VuFind/Wordcat,...) ISBN reader Related bibliographic records finder Authority linking application
OCR Service Website Link Optical character recognition (OCR) application for conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text Texts can be stored in Digital Repository as separate layer and used for further analysis OCR service can recognize more than 40 languages with high accuracy Can be trained to work in other languages too High speed of recognition (1-2 second/page)
HiTiME Project Go to websiteHiTiME is text analysis system for the recognition and extraction of historical events and facts from historical sources and archives. Named Entity Recognition process: Persons (Dora Russel, Karl Marx, ...) Locations (Amsterdam, the Netherlands, ...) Dates (October 18, 2012,...)All named entities will be stored in Knowledge Base and can be linked, persons can create social networks.
IISG resources for HiTiME (Machine Learning) Training on Authority Records from Evergreen can improve accuracy and recall of Named Entity Recognition (NER) Evergreen marc21 records for Topic Detection and Tracking (for example, 6XX Subject Access Fields, etc..) IISG archives and collections can be used to create corpus of related documents
Questions? International Institute of Social History (IISG)Royal Netherlands Academy of Arts and Sciences (KNAW) Digital Infrastructure Department (DI) Vyacheslav Tykhonov Library Systems Developer mailto: firstname.lastname@example.org