Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Aim of project
CARE: Curation of Dutch Regional Dialect Dictionaries
Nicoline van der Sijs, Henk van den Heuvel,
Roeland v...
Upcoming SlideShare
Loading in …5
×

Care henk vd Heuvel

365 views

Published on

poster at the CLARIAH-2016 day

Published in: Science
  • Be the first to comment

  • Be the first to like this

Care henk vd Heuvel

  1. 1. Aim of project CARE: Curation of Dutch Regional Dialect Dictionaries Nicoline van der Sijs, Henk van den Heuvel, Roeland van Hout, Eric Sanders CLS/CLST, Radboud University Nijmegen, The Netherlands •OCR version of PDF files (WBD & WLD, Parts I and II • Formerly curated TSV files for WBD & WLD, Part III • FP5 files of WGD What we deliver • Generic LMF model for dialect dictionaries • WBD, WLD as CSV files and LMF files • For at least 32 of 42 books of Parts I and II • For all 28 books of Part III • Original PDFs of books • CMDI files per Part • Curation Reports Where we start The CARE project is funded by CLARIN-NL under grant number 15-004 • Definition of a generic database structure for dialect dictionaries (LMF) • Link the structure to Woordenboek van de Vlaamse Dialecten (WVD) and other regional dictionaries • Curation of Woordenboek van de Brabantse dialecten (WBD) and Woordenboek van de Limburgse Dialecten (WLD) parts I and II • Update curation of WBD and WLD Part III • Include Woordenboek van de Gelderse Dialecten (WGD) Generic aspects • LMF model suited for all sorts of dialect dictionaries • CMDI metadata profile • Very flexible LMF conversion script PDF book CLARIN Data Centre LMF files CSV files CMDI files CLARIN Data Centre: Meertens Institute • Adding Persistent Identifiers • Storage CMDI -Metadata profile includes: -Link to LMF LMF script -Converts CSV file into LMF CSV script -Converts typographed text file into CSV file by: -Typographic & text cleaning - Categorization of information based on typography -Recoding dialect forms -Checking and expanding Kloekecodes -Logfile is used for iterative manual correction Manual Preprocesing by trained assistents, greatly acknowledged: Aukje Borkent, Maaike Borst, Eline Dimmendaal, Jorik van Engeland and Inge Otto - Addition of typographic codes for Comments (“Toelichting”) in text file - Correcting script errors

×