IMPACT Final Conference - Steven Krauwer


Published on

Steven Krauwer from CLARIN presentation: CLARIN & IMPACT: Crossing Paths

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

IMPACT Final Conference - Steven Krauwer

  1. 1. CLARIN and IMPACT: Crossing paths Steven Krauwer Utrecht University / CLARIN
  2. 2. Overview <ul><li>CLARIN in a nutshell </li></ul><ul><li>Examples </li></ul><ul><li>Goals and vision </li></ul><ul><li>Important features </li></ul><ul><li>Phasing </li></ul><ul><li>Structure and shape </li></ul><ul><li>Essential ingredients </li></ul><ul><li>Who are in, who are missing </li></ul><ul><li>Libraries </li></ul><ul><li>To conclude </li></ul>
  3. 3. CLARIN in a nutshell <ul><li>Common Language Resources and Technology Infrastructure ( </li></ul><ul><li>Basic idea: </li></ul><ul><ul><li>European federation of digital repositories with language data and tools (text, speech, multimodal, gesture …) </li></ul></ul><ul><ul><li>with access to language and speech technology tools through web services to retrieve, manipulate, enhance, explore and exploit data </li></ul></ul><ul><ul><li>with uniform single sign-on access to archives and tools </li></ul></ul><ul><ul><li>target audience humanities and social sciences scholars </li></ul></ul><ul><ul><li>to cover all EU and associated countries </li></ul></ul><ul><ul><li>and all languages relevant for target audience </li></ul></ul>
  4. 4. What should the user be able to ask? <ul><li>give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) </li></ul><ul><li>give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943) </li></ul><ul><li>find TV news interviews that involve German speakers with a Dutch accent </li></ul><ul><li>summarize all articles in European newspapers of August 2010 about OCR – in Spanish </li></ul><ul><li>Show me the pronoun systems in the languages of Nepal </li></ul>
  5. 5. Goals and vision – the role of language <ul><li>Language is at the heart of many disciplines in the Humanities and Social Sciences (HSS), e.g. </li></ul><ul><ul><li>As object of study </li></ul></ul><ul><ul><li>As a means of human communication </li></ul></ul><ul><ul><li>As a means of human expression </li></ul></ul><ul><ul><li>As part of one’s cultural identity </li></ul></ul><ul><ul><li>As carrier of knowledge and information </li></ul></ul>
  6. 6. Goals and vision – advancing research <ul><li>Infrastructure that gives access to digital data and advanced language processing tools will move HSS towards a real e-Science scenario </li></ul><ul><li>Current picture highly fragmented: lots of data and tools exist, spread all over Europe (and beyond) , but hard to find, hard to combine and hard to operate for HSS scholars without technical background </li></ul><ul><li>CLARIN will provide sustainable access to data and state-of-the-art tools, wherever they are, right from the researcher’s desk, in order to advance and innovate HSS research in the broadest possible sense </li></ul>
  7. 7. Important features (1) <ul><li>CLARIN is not about creating new institutions or installations, but builds on what exists </li></ul><ul><li>CLARIN as such is not focused on technology development or content creation, but aims at integrating what is available and making it accessible, BUT: without content (data and tools) no CLARIN!!!!! </li></ul><ul><li>CLARIN will not own data and tools but will provide access to these resources </li></ul><ul><li>CLARIN is not oriented towards commercial markets, but serves the Humanities and Social Sciences research communities </li></ul><ul><li>CLARIN is not a project but should become a sustainable facility for the research community </li></ul>
  8. 8. Important features (2) <ul><li>CLARIN makes it possible for the researcher to find resources (metadata search) </li></ul><ul><li>CLARIN makes it possible to access resources by single sign-on </li></ul><ul><li>CLARIN offers access to web services and workflows to perform complex linguistic operations </li></ul><ul><li>CLARIN allows for virtual collections (originating from different sources) </li></ul><ul><li>CLARIN serves both expert and non-expert users </li></ul><ul><li>CLARIN covers both historical and contemporary language based material in all modalities </li></ul><ul><li>CLARIN is interested in both linguistic data and its content </li></ul><ul><li>CLARIN finds all languages equally important </li></ul>
  9. 9. Phasing of CLARIN <ul><li>Does CLARIN exist? Yes and no. </li></ul><ul><li>2008-2011: CLARIN Preparatory Phase Project, funded by EC in the context of the ESFRI Research Infrastructures Roadmap (grant 212230, 4.1M€); Goal: designing the infrastructure technically and organisationally, and lining up the players </li></ul><ul><li>2011-2013: Construction Phase, jointly funded by the participating countries Goal: building the infrastructure </li></ul><ul><li>2013-…: Exploitation Phase, jointly funded by participating countries Goal: making and keeping it running, and ensuring that it follows new trends in technology and research </li></ul>
  10. 10. Structure of CLARIN <ul><li>Three levels: </li></ul><ul><li>Governed by CLARIN ERIC, an international legal entity which is a consortium of governments (not universities) </li></ul><ul><li>Two operational levels: </li></ul><ul><ul><li>Infrastructure level, consisting of centres (one or more per country, fully funded by own government), coordinated by CLARIN ERIC </li></ul></ul><ul><ul><li>In each country a national consortium responsible for creation of data and tools compliant with CLARIN, nationally funded </li></ul></ul>
  11. 11. CLARIN ERIC <ul><li>An ERIC is new type of international legal entity, essentially a consortium of countries </li></ul><ul><li>CLARIN ERIC will act as a governance and coordination body, but not as an operational body </li></ul><ul><li>ERIC member countries finance the activities of the ERIC </li></ul><ul><li>Countries will each set up a national consortium, with a national coordinator, to represent it and to communicate with CLARIN ERIC – and to do all the work </li></ul><ul><li>It is up to the countries to decide how to shape and fund their CLARIN consortia and how to relate them to other activities at the national level (e.g. research programmes, digitisation programmes, etc) </li></ul><ul><li>CLARIN ERIC expected to start Jan 01 2012 </li></ul>
  12. 12. Shape of CLARIN CLARIN
  13. 13. Shape of CLARIN CLARIN Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools Infrastructure operations
  14. 14. Shape of CLARIN CLARIN CLARIN ERIC Governs, coordinates and supports Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools Infrastructure operations
  15. 15. Shape of CLARIN CLARIN CLARIN ERIC Governs, coordinates and supports CLARIN Infrastructure Layer (technical and knowledge sharing) Coordinated by ERIC, operated by countries Member 1 Member 3 M 2 M 4 M 5 Creation of knowledge, data and tools
  16. 16. Essential ingredients of the infrastructure <ul><li>Standards - are crucial to make things work together, but cannot be imposed in a top-down fashion </li></ul><ul><li>Centres offering stable and reliable services, compliant with CLARIN requirements </li></ul><ul><li>Every member (i.e. country) is expected to offer at least one centre, but eventually we want to ensure access to all relevant resources in all participating countries (libraries!) </li></ul><ul><li>Service centres to be complemented by (virtual) competence centres in a variety of fields (IMPACT’s OCR centre) </li></ul><ul><li>BLUE: this is where our paths cross! </li></ul>
  17. 17. Typical CLARIN centres <ul><li>At this moment the operational level includes: </li></ul><ul><ul><li>Universities willing to share data collections and/or offering tools as web services </li></ul></ul><ul><ul><li>National academies of sciences </li></ul></ul><ul><ul><li>National language institutions </li></ul></ul><ul><ul><li>Other research institutions owning language resources and tools </li></ul></ul><ul><ul><li>Mostly written language resources </li></ul></ul><ul><li>Missing </li></ul><ul><ul><li>Commercial parties (not surprising) </li></ul></ul><ul><ul><li>Better coverage of other modalities </li></ul></ul><ul><ul><li>National or other public archives </li></ul></ul><ul><ul><li>Libraries (just a few in some countries) </li></ul></ul>
  18. 18. Libraries <ul><li>Why are they not in? </li></ul><ul><ul><li>CLARIN very much bottom-up, NLP community driven </li></ul></ul><ul><ul><li>Difference in digital cultures (libraries starting from access to images of documents rather than to the content) </li></ul></ul><ul><li>Are there exceptions? </li></ul><ul><ul><li>Yes, libraries participate in national CLARIN consortia in e.g. NL, DK, FI, … </li></ul></ul><ul><li>Why is this a problem? </li></ul><ul><ul><li>Libraries are sitting on 5 centuries of printed material (100 million books!) indispensable for many HSS scholars </li></ul></ul><ul><ul><li>Scholars should have access to everything, both to images and to content, as well as to services that access both form and content </li></ul></ul>
  19. 19. CLARIN and the Libraries <ul><li>What are the obstacles (O) & directions for solutions (S) </li></ul><ul><ul><li>O: CLARIN and libraries don’t talk to each other on a structural basis to find a common path forward S: libraries have started reflecting on their future role and the ERIC structure may help to bring parties together </li></ul></ul><ul><ul><li>O: IPR – problem for all parties S: can only be solved adequately at legislative level </li></ul></ul><ul><ul><li>O: Common sets of standards required S: Need to set up forum for discussion </li></ul></ul><ul><ul><li>O: The sheer volume S: Phasing, based on research priorities </li></ul></ul><ul><ul><li>O: The OCR gap (especially for historical documents) S: Developing better technologies & creation of a European centre of expertise in OCR [ == IMPACT!] </li></ul></ul>
  20. 20. To conclude <ul><li>My mission for today was </li></ul><ul><ul><li>To explain to those who didn’t know what CLARIN is about </li></ul></ul><ul><ul><li>To explain why we feel that CLARIN cannot succeed in its objective to promote and support e-Humanities without close collaboration with the libraries </li></ul></ul><ul><ul><li>To explain how extremely pleased we are with the IMPACT project </li></ul></ul><ul><ul><ul><li>Because of its important contribution to the further development and improvement of OCR technology </li></ul></ul></ul><ul><ul><ul><li>Because of its intention to create a centre of expertise in OCR beyond the horizon of the project </li></ul></ul></ul><ul><li>I hope that after this meeting we will be able to set up a lasting collaboration between CLARIN and the communities represented by the IMPACT project </li></ul>