IMPACT Final Conference - Steven Krauwer
Upcoming SlideShare
Loading in...5

IMPACT Final Conference - Steven Krauwer



Steven Krauwer from CLARIN presentation: CLARIN & IMPACT: Crossing Paths

Steven Krauwer from CLARIN presentation: CLARIN & IMPACT: Crossing Paths



Total Views
Views on SlideShare
Embed Views



2 Embeds 204 203 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    IMPACT Final Conference - Steven Krauwer IMPACT Final Conference - Steven Krauwer Presentation Transcript

    • CLARIN and IMPACT: Crossing paths Steven Krauwer Utrecht University / CLARIN
    • Overview
      • CLARIN in a nutshell
      • Examples
      • Goals and vision
      • Important features
      • Phasing
      • Structure and shape
      • Essential ingredients
      • Who are in, who are missing
      • Libraries
      • To conclude
    • CLARIN in a nutshell
      • Common Language Resources and Technology Infrastructure (
      • Basic idea:
        • European federation of digital repositories with language data and tools (text, speech, multimodal, gesture …)
        • with access to language and speech technology tools through web services to retrieve, manipulate, enhance, explore and exploit data
        • with uniform single sign-on access to archives and tools
        • target audience humanities and social sciences scholars
        • to cover all EU and associated countries
        • and all languages relevant for target audience
    • What should the user be able to ask?
      • give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350)
      • give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943)
      • find TV news interviews that involve German speakers with a Dutch accent
      • summarize all articles in European newspapers of August 2010 about OCR – in Spanish
      • Show me the pronoun systems in the languages of Nepal
    • Goals and vision – the role of language
      • Language is at the heart of many disciplines in the Humanities and Social Sciences (HSS), e.g.
        • As object of study
        • As a means of human communication
        • As a means of human expression
        • As part of one’s cultural identity
        • As carrier of knowledge and information
    • Goals and vision – advancing research
      • Infrastructure that gives access to digital data and advanced language processing tools will move HSS towards a real e-Science scenario
      • Current picture highly fragmented: lots of data and tools exist, spread all over Europe (and beyond) , but hard to find, hard to combine and hard to operate for HSS scholars without technical background
      • CLARIN will provide sustainable access to data and state-of-the-art tools, wherever they are, right from the researcher’s desk, in order to advance and innovate HSS research in the broadest possible sense
    • Important features (1)
      • CLARIN is not about creating new institutions or installations, but builds on what exists
      • CLARIN as such is not focused on technology development or content creation, but aims at integrating what is available and making it accessible, BUT: without content (data and tools) no CLARIN!!!!!
      • CLARIN will not own data and tools but will provide access to these resources
      • CLARIN is not oriented towards commercial markets, but serves the Humanities and Social Sciences research communities
      • CLARIN is not a project but should become a sustainable facility for the research community
    • Important features (2)
      • CLARIN makes it possible for the researcher to find resources (metadata search)
      • CLARIN makes it possible to access resources by single sign-on
      • CLARIN offers access to web services and workflows to perform complex linguistic operations
      • CLARIN allows for virtual collections (originating from different sources)
      • CLARIN serves both expert and non-expert users
      • CLARIN covers both historical and contemporary language based material in all modalities
      • CLARIN is interested in both linguistic data and its content
      • CLARIN finds all languages equally important
    • Phasing of CLARIN
      • Does CLARIN exist? Yes and no.
      • 2008-2011: CLARIN Preparatory Phase Project, funded by EC in the context of the ESFRI Research Infrastructures Roadmap (grant 212230, 4.1M€); Goal: designing the infrastructure technically and organisationally, and lining up the players
      • 2011-2013: Construction Phase, jointly funded by the participating countries Goal: building the infrastructure
      • 2013-…: Exploitation Phase, jointly funded by participating countries Goal: making and keeping it running, and ensuring that it follows new trends in technology and research
    • Structure of CLARIN
      • Three levels:
      • Governed by CLARIN ERIC, an international legal entity which is a consortium of governments (not universities)
      • Two operational levels:
        • Infrastructure level, consisting of centres (one or more per country, fully funded by own government), coordinated by CLARIN ERIC
        • In each country a national consortium responsible for creation of data and tools compliant with CLARIN, nationally funded
      • An ERIC is new type of international legal entity, essentially a consortium of countries
      • CLARIN ERIC will act as a governance and coordination body, but not as an operational body
      • ERIC member countries finance the activities of the ERIC
      • Countries will each set up a national consortium, with a national coordinator, to represent it and to communicate with CLARIN ERIC – and to do all the work
      • It is up to the countries to decide how to shape and fund their CLARIN consortia and how to relate them to other activities at the national level (e.g. research programmes, digitisation programmes, etc)
      • CLARIN ERIC expected to start Jan 01 2012
    • Shape of CLARIN CLARIN
    • Shape of CLARIN CLARIN Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools Infrastructure operations
    • Shape of CLARIN CLARIN CLARIN ERIC Governs, coordinates and supports Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools Infrastructure operations
    • Shape of CLARIN CLARIN CLARIN ERIC Governs, coordinates and supports CLARIN Infrastructure Layer (technical and knowledge sharing) Coordinated by ERIC, operated by countries Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools
    • Essential ingredients of the infrastructure
      • Standards - are crucial to make things work together, but cannot be imposed in a top-down fashion
      • Centres offering stable and reliable services, compliant with CLARIN requirements
      • Every member (i.e. country) is expected to offer at least one centre, but eventually we want to ensure access to all relevant resources in all participating countries (libraries!)
      • Service centres to be complemented by (virtual) competence centres in a variety of fields (IMPACT’s OCR centre)
      • BLUE: this is where our paths cross!
    • Typical CLARIN centres
      • At this moment the operational level includes:
        • Universities willing to share data collections and/or offering tools as web services
        • National academies of sciences
        • National language institutions
        • Other research institutions owning language resources and tools
        • Mostly written language resources
      • Missing
        • Commercial parties (not surprising)
        • Better coverage of other modalities
        • National or other public archives
        • Libraries (just a few in some countries)
    • Libraries
      • Why are they not in?
        • CLARIN very much bottom-up, NLP community driven
        • Difference in digital cultures (libraries starting from access to images of documents rather than to the content)
      • Are there exceptions?
        • Yes, libraries participate in national CLARIN consortia in e.g. NL, DK, FI, …
      • Why is this a problem?
        • Libraries are sitting on 5 centuries of printed material (100 million books!) indispensable for many HSS scholars
        • Scholars should have access to everything, both to images and to content, as well as to services that access both form and content
    • CLARIN and the Libraries
      • What are the obstacles (O) & directions for solutions (S)
        • O: CLARIN and libraries don’t talk to each other on a structural basis to find a common path forward S: libraries have started reflecting on their future role and the ERIC structure may help to bring parties together
        • O: IPR – problem for all parties S: can only be solved adequately at legislative level
        • O: Common sets of standards required S: Need to set up forum for discussion
        • O: The sheer volume S: Phasing, based on research priorities
        • O: The OCR gap (especially for historical documents) S: Developing better technologies & creation of a European centre of expertise in OCR [ == IMPACT!]
    • To conclude
      • My mission for today was
        • To explain to those who didn’t know what CLARIN is about
        • To explain why we feel that CLARIN cannot succeed in its objective to promote and support e-Humanities without close collaboration with the libraries
        • To explain how extremely pleased we are with the IMPACT project
          • Because of its important contribution to the further development and improvement of OCR technology
          • Because of its intention to create a centre of expertise in OCR beyond the horizon of the project
      • I hope that after this meeting we will be able to set up a lasting collaboration between CLARIN and the communities represented by the IMPACT project