1. CLARIN – Corpora, corpus tools and
collaboration
Martin Wynne
martin.wynne@ling-phil.ox.ac.uk
Faculty of Linguistics, Philology and Phonetics,
University of Oxford
National Coordinator, CLARIN-UK
Introducing the Written BNC2014
Friday 19th
November 2021
Lancaster Castle
2. CLARIN – a short summary
• CLARIN is the Common Language Resources and Technology
Infrastructure
• a European Research Infrastructure Consortium (ERIC) since 2012
• which provides easy and sustainable access for scholars in the
humanities and social sciences and beyond
• to digital language data (in written, spoken, video or multimodal form)
• and advanced tools to discover, explore, exploit, annotate, analyse or
combine them, wherever they are located
• through a single sign-on environment
• that serves as an ecosystem for knowledge exchange
• and ready for integration in EOSC (European Open Science Cloud; link)
CLARIN Value Proposition: https://www.clarin.eu/content/value-proposition (link to pdf)
2
3. CLARIN – an even shorter summary
CLARIN aims to transform the current fragmented landscape where digital
tools and applications are not necessarily easy to find, or use, or connect
together, and are often supported only for the lifetime of a fixed-term
funded project…
...to a situation where there is ongoing support for access, re-use,
sustainability and connectivity, with services embedded in stable, long-
term centres.
3
4. CLARIN ERIC in members and centres
A consortium of:
• 21 members: AT, BG, CY, CZ, DE,
DK, EE, FI, GR, HR, HU,IS, IT,
LT, LV, NL, NO, PL, PT, SE, SI
• 3 observers: FR, UK, ZA
• >60 centres
(incl. 24 certified data centres)
4
6. CLARIN in data types
• Language corpora
• Newspaper archives
• Literary texts
• Social Media data
• Parliamentary records
• Historical correspondence
• Oral History data
• Broadcast archives
• …
See also the info on the CLARIN Resource Families initiative: https://www.clarin.eu/resource-families
6
7. CLARIN in communities of use
• Digital Humanities
• Linguistics and Philology
• Translation and Lexicography
• Literary Studies
• History
• Political and Social Sciences
• Media Studies
• Culture, Folklore, Anthropology
• Speech therapy
• Teachers
• General Public
• ….
7
8. CLARIN and Open Science
• Promoting the sharing and re-use of data through sustainable data registries
• All integrated datasets available in open access for research purposes
• Adherence to the FAIR data principles
- Findable, Accessible, Interoperable, Re-usable
- Interoperability through a common metadata framework
• Promotion of responsible data science
• Support for linguistic diversity
- Data covering more than 1500 languages
- Tools for many languages
- Language resources in all modalities
• Strengthening the support for 500,000 professional SSH researchers
CLARIN: Towards FAIR and Responsible Data Science Using Language Resources." In: Proceedings of the
Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, 3259-
3264.
8
9. CLARIN for knowledge exchange
• A network of Knowledge Centres (K-centres)
• Sharing of expertise and best practices
- Annual Conference
- Support for workshops and mobility
- Ambassador network
• Capacity training through live events, online courses and webinars
- for developers
- for end-users
• Collaboration with
- other SSH infrastructures (e.g. CESSDA, DARIAH-EU, ESS, SHARE; cf. also H2020
project SSHOC)
- EOSC-related projects
- Europeana
- LIBER
9
10.
11. CLARIN-UK
●
Started with three universities in the CLARIN Preparatory Phase project
(Lancaster, Oxford, Sheffield)
●
CLARIN-UK Consortium formed in 2014 (ten members initially)
●
UK Observer in CLARIN ERIC 2015-21
●
Moving towards full membership - Extraordinary extension with additional
fees 2021
●
UK national research infrastructure funding expected 2022 onwards