Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Frank Landsbergen, Carole Tiberius & Roderik Dernison; Institute for Dutch Lexicology, Leiden, The Netherlands; {frank.lan...
Upcoming SlideShare
Loading in …5

LREC Ton vd Wouden


Published on

Poster at the CLARIAH-2016 day

Published in: Science
  • Be the first to comment

  • Be the first to like this

LREC Ton vd Wouden

  1. 1. Frank Landsbergen, Carole Tiberius & Roderik Dernison; Institute for Dutch Lexicology, Leiden, The Netherlands; {frank.landsbergen, carole.tiberius, roderik.dernison} taalportaal: an online grammar of Dutch and Frisian Taalportaal workflow The input for Taalportaal is formed by grammatical texts in XML. Most Taalportaal authors write directly in XML using the Taalportaal editing environment. The materials of the Syntax of Dutch follow a different route though, due to its genesis, and consist of Word documents which are automatically converted to XML. The Taalportaal authors who are based at different institutes and universities throughout the country, store their grammatical texts in a central Subversion repository. This data is copied onto a local file system and forms, together with the database of terms and the database of bibliographic references, the core of the Taalportaal web application. The process from retrieving the data from the Subversion repository to displaying the data online is completely automated such that data updates can be realized at regular intervals. The core of the Taalportaal website consists of a vast number of topics: small, independently readable texts on a specific grammatical subject from Dutch or Frisian phonology, morphology or syntax. (1) search box (2) left panel with metadata information, table of contents, and links to bibliography and glossary (3) breadcrumbs, showing the position of the current page in the table of contents hierarchy (4) right panel with options zoom, comment, print, email, cite (not available in the current beta version) (5) short introductory description of what the topic is about (6) main text of topic (7) reference list and links to related topics (not available in the current beta version) Taalportaal website Author infrastructure The content of Taalportaal is created by a team of authors. Content is, where possible, taken from existing descriptions from Dutch and Frisian, and updated where needed. Since the original sources generally have a linear structure, they have been rewritten to make them more suitable for the internet. Authors use the XML-editor oXygen, which has been customized to the specific needs of the project. This editor contains a link to an extensive database of bibliographical references, that authors can use to insert entries in their texts, and to add or edit them. This database has been filled by merging the references from several textbooks on Dutch phonology, morphology and syntax. Authors use a Taalportaal-XML schema, which is based on the DITA format ( to enforce a topic-based approach to writing. Dutch syntax What is Taalportaal? Taalportaal will create an online portal containing an exhaustive and fully searchable electronic grammar of Dutch and Frisian phonology, morphology and syntax. Its content will be in English. Why Taalportaal? • Currently, no comprehensive scientific online grammar of Dutch and Frisian exists. • Taalportaal will serve the scientific community by organizing, integrating and completing the grammatical knowledge of both languages, and by making this data accessible in an innovative way. • The digital design of the portal enables interoperability between the linguistic categories of phonology, morphology and syntax on the one hand, and between the two languages on the other. The portal’s rich crosslinking will benefit these domains of research, which are now often studied in isolation. Taalportaal consortium Sponsor:Taalportaal timeline 2011 2012 2013 2014 2015 2016 start of the project beta version online delivery of final version All data for the module of Dutch syntax is taken directly from the existing (and almost completed) Syntax of Dutch, an up-to-date and comprehensive syntactic description of Standard Dutch. Since its data is written in Word and covers thou- sands of pages, a ne- cessary conversion to XML has to be perfor- med before it can be added to the portal. This conversion is done using a Perl-script that has been especially created for this par- ticular job. Since the converted material can still contain errors, all texts need to be manually checked be- fore they can appear on the portal site. Cur- rently, one of three parts of the Syntax of Dutch has been conver- ted and is published on the Taalportaal web- site. 2 5 6 3 7 4 1