SlideShare a Scribd company logo
1 of 38
CLARIN-NL
Results and Evaluation
Jan Odijk
CLARIN-NL Final Event
Hilversum, 2015-03-13
1
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
2
Overview
• Infrastructure Core
CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
3
Infrastructure Core
• 5 CLARIN Centres (‘Type B Centres’)
1. MPI
2. Meertens Institute
3. INL
4. Huygens ING
5. DANS
• 3 CLARIN Data Providers (‘Type D Centres’)
1. National Library (KB)
2. Utrecht University Library
3. Netherlands Institute for Sound and Vision
4
Infrastructure Core
• CLARIN Centres
– Have set up a proper repository system
• So resources can be stored there
– Have their CMDI-metadata harvestable
• So resources are visible to others
– Support for persistent identifiers (PIDs)
• So links to resources are ‘never’ broken
– Long-term archiving solution in place
• So resources will not get lost
– Provisions for federated identity management
• So you can login with your own institute account (single sign-on)
– Have acquired the Data Seal of Approval
• So the data repositories can be trusted and are sustainable
5
Infrastructure Core
• CLARIN Type A Centres in NL
– Offers services for the whole CLARIN infrastructure
– Mainly MPI, some Meertens (and UU)
• Enables you to search for resources:
– Harvesting of metadata , Virtual Language Observatory, Meertens
Metadata Search (Meertens), CLARIN-NL Portal (UU)
• Enables you to create metadata
– CMDI registry, CMDI Profile editor, Metadata editor
• Enables you to ensure semantic interoperability
– ISOCAT, RELCAT, SchemaCat
– CLAVAS, CLARIN Concept Registry (Meertens)
– Transfer from MPI to other centres (in EU) on-going
6
Overview
• Infrastructure Core
– CLARIN Centres
Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
7
Infrastructure Core
• Metadata and Metadata Search
– CMDI metadata created for all data dealt with in CLARIN-
NL
– Using flexible CMDI
• If needed, with user defined profiles and components
– Searching for data possible via the
• VLO
• Meertens Metadata Search
• Some work done on metadata for software
– Partially reflected in CLARIN-NL Portal
– But not (yet) in CMDI records / VLO
8
Infrastructure Core
• Metadata and Metadata Search
– CMDI `too flexible’
– Big variation in granularity
– Hardly any requirements on obligatoriness of certain
metadata elements
• some crucial metadata elements are lacking
• VLO
– Gives access to over 800k metadata records
– KB metadata are not included (> 1 million)
– Many external origin with no control over the metadata
– Limited search options via VLO
• Search via VLO is not as easy as it should be
• CLARIN-NL Portal improves this for NL resources
• Will be taken up in CLARIAH 9
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
10
Infrastructure Core
• Federated Content Search (FCS)
– Search via a single interface in multiple, distributed, data
• NL centres created ‘end points’ for selected
resources
– So they can participate in FCS
• Development of search interface and aggregator
– Different approaches NL v. DE
– NL Development stopped, adopted DE approach
– See CLARIN-D FCS Aggregator
• So far, only string (keyword) search is possible
• Will be taken up again in CLARIAH
11
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
12
Data Curation
• By the CLARIN Data Curation Service (DCS)
– E.g. LESLLA, dialect dictionaries, IPNV Interviews with
veterans
• Via open calls and closed calls
– In many (small) projects
• Recent examples: VALID, DSS, eBNM+
• Broad coverage of the humanities
• Contributed significantly to user involvement
13
Data Curation
14
Discipline Count
Linguistics 16
History 9
Literary Studies 5
Culture Sciences 4
Communication & Media Studies 2
Religion Studies 2
Computational Linguistics 1
Philosophy 1
Political Sciences 1
Data Curation
15
Linguistics Count
Acquisition 5
Historical Linguistics 4
Syntax 4
Morpho-syntax 3
Discourse 2
Language Documentation 2
Lexicology 2
6 others with each 1
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
• Conclusions
16
Software Curation /
Web Applications
• Via open calls and closed calls
In many (small) projects
– Curation / upgrades of existing software
• AVResearcherXL (QuaMerdes), SHEBANQ, ColTime and EXILSEA
upgrades of ELAN, PaQu, Cornetto Interface, …
– Creation of new software
• DSS, eBNM+, RemBench, OpenSONAR, PICCL, AutoSearch, …
– Broad coverage of the humanities
– Contributed significantly to user involvement
17
Software Curation /
web applications
18
Discipline Count
Linguistics 27
History 14
Literary Studies 5
Communication & Media Studies 4
Cultural Sciences 4
Political Sciences 4
Computational Linguistics 3
3 others with each 1-2
Software Curation /
web applications
19
Linguistics Count
Syntax 13
Morpho-syntax 7
Historical linguistics 5
Lexicology 5
Dialectology 4
Sign Language 4
7 others with each 2
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
 Interoperability
• What you can do
• Education and Training
• Conclusions
20
Interoperability
• Interoperability
– Do tools apply to data seamlessly?
– Can data be combined seamlessly?
– Can tools be combined seamlessly?
– Does CLARIN support data in real-world formats?
21
Interoperability
• Syntactic Interoperability
– FoLIA becoming a de facto standard format for
linguistically annotated text corpora in the Netherlands
• TTNWW, PICCL, VU-DNC, Nederlab, Basilex, …
– CLAM de facto standard in NL for turning software into
RESTful web services
– But
• there are also other important formats that must be supported
(TEI, LASSY XML, …)
• And still too many ad-hoc formats, often without explicit syntax
and semantics
22
Interoperability
• Semantic Interoperability
– Data Categories for metadata elements actually used
(e.g. in the VLO)
– Data Categories for many data (content) elements defined
but hardly used yet
– ISOCAT was a useful data category registry
• But had many problems
– Now replaced by the CLARIN Concept Registry
• Solves some of ISOCAT’s problems but not all
• Will be addressed in CLARIAH
23
Interoperability
• Support for real world formats
– New research data do not come in standardized formats
– But as mixtures of .doc, .docx, HTML, PDF, plain text,
ePub, …
– And multiple standard formats must be supported in
CLARIN (e.g. both FoLIA and TEI)
– Support for data conversions via the OpenConvert project
– But more is needed
• Will be addressed in CLARIAH
24
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
What you can do
• Education and Training
• Conclusions
25
What you can do
• Find and select existing data
– Virtual Language Observatory, Meertens Metadata
Search, CLARIN-NL Portal
• Create new data through OCR and orthographic
normalisation
– PICCL
• Create metadata for new or existing data
– CMDI Registry, CMDI profile editor, metadata editors (e.g.
ARBIL), …
26
What you can do
• Make semantics of metadata and data explicit
– ISOCAT, RELCAT, SchemaCAT
• now replaced by CLARIN Concept Registry (CCR)
– CLAVAS
• Enrich data with various kinds of annotations
– TTNWW
• Orthographic normalisation, pos-tagging, lemmatisation,
parsing, named entity recognition, ….
– Adelheid, INPOLDER, PaQu, ColTime and EXILSEA
extensions to ELAN
• Upload enriched data to search applications
– PaQu, AutoSearch
27
What you can do
• Search, browse in data and analyze (meta)data and
query results
– OpenSONAR, GrETEL, PaQu, MIMORE, FESLI, SHEBANQ,
AutoSearch, …
– Arthurian Fiction, NameScape, COBWWWEB, eBNM+, C-
DSD, DSS, RemBench, Nederlab, …
– Interviews, WIP, VK, Polimedia, CKCC, DSS,
AVResearcherXL, …
– DUELME, WFT-GTB, CORNETTO, …
28
What you can do
• Visualize data analyses
– COAVA, FESLI, MIMORE, Gabmap, SHEBANQ, Nederlab,
OpenSONAR, …
– CKCC, MIGMAP, AVResearcherXL
• Store new data safely at a CLARIN Centre
– All 5 centres have the Data Seal of Approval
– 4 centres are certified CLARIN Centres
29
Invitation
• Use (elements from) the CLARIN infrastructure
• Join user groups of specific services
• Provide feedback so that we can further improve
CLARIN
• So that you can improve your research
30
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
 Education and Training
• Conclusions
31
Education & Training
• How do you learn to use these tools?
– Courses / tutorials regularly organized
– LOT summer / winter school courses
– Demonstration scenarios and/or screen casts
• E.g. for Gabmap GrETEL OpenSONAR
– Educational modules via the portal:
• https://dev.clarin.nl/node/CLARIN%20Educational%20Packages
– Helpdesk: helpdesk@clarin.nl
32
Education & Training
• Do you want to know more?
– Visit the CLARIN-NL portal
• http://portal.clarin.nl
– View the CLARIN-NL movies
• http://www.clarin.nl/node/403
– Visit the demonstrations today
– Ask me (or others) today
33
Overview
• Infrastructure Core
– CLARIN Centres
– Metadata and Searching for data
– Federated Content Search
• Resource Curation
– Data Curation
– Software Curation & Web Applications
• Interoperability
• What you can do
• Education and Training
 Conclusions
34
Conclusions (1)
• CLARIN is starting to provide the data, facilities and services to carry
out humanities research supported by large amounts of data and tools
• With easy interfaces and easy search options (no technical background
needed)
• Some training in using the tools is needed
– To use the possibilities optimally
– To understand the limitations of the data and the tools
– Educational modules for selected functionality are available
– Tutorials / trainings will continue to be regularly organized
35
Conclusions (2)
• But there is still a lot to do
– Extensions of and improvements in metadata
– Improvements of VLO
– Improved functionality for most tools
• Need / desire found b y actual use of the tools
– Extend and improve search options for individual resources
– Create options of searching across different resources of the same type
– Improved interoperability
36
Conclusions(3)
• A successor project is needed!
• CLARIAH www.clariah.nl
• Proposal approved June 1, 2014
• Started Jan 1st, 2015
• Kick-off this afternoon
37
THANKS FOR YOUR ATTENTION!
38

More Related Content

What's hot

Liber's digital preservation projects
Liber's digital preservation projectsLiber's digital preservation projects
Liber's digital preservation projects
LIBER Europe
 

What's hot (20)

ELIXIR Standards and Formats: ISA Tools and FAIRsharing
ELIXIR Standards and Formats: ISA Tools and FAIRsharingELIXIR Standards and Formats: ISA Tools and FAIRsharing
ELIXIR Standards and Formats: ISA Tools and FAIRsharing
 
Bdva - iSpaces
Bdva - iSpacesBdva - iSpaces
Bdva - iSpaces
 
Certification of data repositories - CoreTrustSeal
Certification of data repositories - CoreTrustSealCertification of data repositories - CoreTrustSeal
Certification of data repositories - CoreTrustSeal
 
Repositories for long-term preservation - certification
Repositories for long-term preservation - certificationRepositories for long-term preservation - certification
Repositories for long-term preservation - certification
 
Eudat research data management services | www.eudat.eu |
Eudat research data management services | www.eudat.eu | Eudat research data management services | www.eudat.eu |
Eudat research data management services | www.eudat.eu |
 
EOSC Summit 2018 - RDA Europe Presentation
EOSC Summit 2018 - RDA Europe PresentationEOSC Summit 2018 - RDA Europe Presentation
EOSC Summit 2018 - RDA Europe Presentation
 
Liber's digital preservation projects
Liber's digital preservation projectsLiber's digital preservation projects
Liber's digital preservation projects
 
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
Martin Donnelly - Digital Data Curation at the Digital Curation Centre (DH2016)
 
Defining collections and creating their descriptions
Defining collections and creating their descriptionsDefining collections and creating their descriptions
Defining collections and creating their descriptions
 
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
Rebecca Grant, Sharon Webb - Preserving Ireland's Digital Cultural Identity T...
 
Developing a national digital library stapel - meijers 20160302
Developing a national digital library   stapel - meijers 20160302Developing a national digital library   stapel - meijers 20160302
Developing a national digital library stapel - meijers 20160302
 
CTDA Brown Bag, Oct. 2016
CTDA Brown Bag, Oct. 2016CTDA Brown Bag, Oct. 2016
CTDA Brown Bag, Oct. 2016
 
Data for Development in the Caribbean
Data for Development in the CaribbeanData for Development in the Caribbean
Data for Development in the Caribbean
 
Connecticut Digital Archive
Connecticut Digital ArchiveConnecticut Digital Archive
Connecticut Digital Archive
 
Sensitive Data Workshop
Sensitive Data WorkshopSensitive Data Workshop
Sensitive Data Workshop
 
Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)Natalie Harrower - Digital Data Sharing (DH2016)
Natalie Harrower - Digital Data Sharing (DH2016)
 
Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)Rebecca Grant - DH research data: identification and challenges (DH2016)
Rebecca Grant - DH research data: identification and challenges (DH2016)
 
The ENVRI user landscape
The ENVRI user landscapeThe ENVRI user landscape
The ENVRI user landscape
 
The Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research OrganisationsThe Data Deluge: the Role of Research Organisations
The Data Deluge: the Role of Research Organisations
 
Kyriazidou Sofia poster
Kyriazidou Sofia posterKyriazidou Sofia poster
Kyriazidou Sofia poster
 

Viewers also liked (6)

Clariah kickoff presentation Media agenda
Clariah kickoff presentation Media agendaClariah kickoff presentation Media agenda
Clariah kickoff presentation Media agenda
 
Clariah lex heerma_van_voss_presentatie ben_g
Clariah lex heerma_van_voss_presentatie ben_gClariah lex heerma_van_voss_presentatie ben_g
Clariah lex heerma_van_voss_presentatie ben_g
 
Clariah jan-luiten 13_maart_2015_1
Clariah jan-luiten 13_maart_2015_1Clariah jan-luiten 13_maart_2015_1
Clariah jan-luiten 13_maart_2015_1
 
Clariah hans-bennis kickoff_clariah
Clariah hans-bennis kickoff_clariahClariah hans-bennis kickoff_clariah
Clariah hans-bennis kickoff_clariah
 
Clariah arianna betti_keynote
Clariah arianna betti_keynoteClariah arianna betti_keynote
Clariah arianna betti_keynote
 
Clariah jan odijk_zaaigeldprojecten
Clariah jan odijk_zaaigeldprojectenClariah jan odijk_zaaigeldprojecten
Clariah jan odijk_zaaigeldprojecten
 

Similar to Clarin nl odijk-final_event_2015-03-13

The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data FutureThe Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
NASIG
 
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
Lucy McKenna
 

Similar to Clarin nl odijk-final_event_2015-03-13 (20)

Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data FutureThe Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
The Canadian Linked Data Initiative: Charting a Path to a Linked Data Future
 
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
TLC2016 - A search engine for Blackboard Learn, the impossible made possible.
 
Identity Management: Tools, processes & services
Identity Management: Tools, processes & servicesIdentity Management: Tools, processes & services
Identity Management: Tools, processes & services
 
Semantic Search
Semantic SearchSemantic Search
Semantic Search
 
RDM @ KU Leuven: De verbindende kracht van het Research Data Management Compe...
RDM @ KU Leuven: De verbindende kracht van het Research Data Management Compe...RDM @ KU Leuven: De verbindende kracht van het Research Data Management Compe...
RDM @ KU Leuven: De verbindende kracht van het Research Data Management Compe...
 
‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...
‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...
‘Development of a MODS-RDF Cataloguing Tool for the Digital Resources and Ima...
 
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
Development of a MODS-RDF Cataloguing Tool for Information Professionals CONU...
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD Resources
 
Successful E-Resource Acquisitions: Looking Beyond Selecting, Ordering, Payin...
Successful E-Resource Acquisitions: Looking Beyond Selecting, Ordering, Payin...Successful E-Resource Acquisitions: Looking Beyond Selecting, Ordering, Payin...
Successful E-Resource Acquisitions: Looking Beyond Selecting, Ordering, Payin...
 
SAICSIT 2011 Postgraduate Symposium Presentation
SAICSIT 2011 Postgraduate Symposium PresentationSAICSIT 2011 Postgraduate Symposium Presentation
SAICSIT 2011 Postgraduate Symposium Presentation
 
From Expert Finding to Entity Search on the Web
From Expert Finding to Entity Search on the Web From Expert Finding to Entity Search on the Web
From Expert Finding to Entity Search on the Web
 
Librarian building blocks; or, how to make the ideal librarian
Librarian building blocks; or, how to make the ideal librarianLibrarian building blocks; or, how to make the ideal librarian
Librarian building blocks; or, how to make the ideal librarian
 
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph TechnologiesBuilding an Innovative Learning Ecosystem at Scale with Graph Technologies
Building an Innovative Learning Ecosystem at Scale with Graph Technologies
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
Collaborative km-from-seci-model-tiim
Collaborative km-from-seci-model-tiimCollaborative km-from-seci-model-tiim
Collaborative km-from-seci-model-tiim
 
Collaborative km-from-seci-model-tiim
Collaborative km-from-seci-model-tiimCollaborative km-from-seci-model-tiim
Collaborative km-from-seci-model-tiim
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 

More from CLARIAH

More from CLARIAH (20)

ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
 
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
DB:CCC Presentation of Karin Hofmeester, CLARIAH Toogdag 19-10-2018
 
Masterclass innosurance 2018
Masterclass innosurance 2018Masterclass innosurance 2018
Masterclass innosurance 2018
 
Flat TLA
Flat TLAFlat TLA
Flat TLA
 
QB'er demonstration
QB'er demonstrationQB'er demonstration
QB'er demonstration
 
Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.Collection registration for the CLARIAH Media Suite.
Collection registration for the CLARIAH Media Suite.
 
CMDI2RDF
CMDI2RDFCMDI2RDF
CMDI2RDF
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
2016 05-20-clariah-wp3
2016 05-20-clariah-wp32016 05-20-clariah-wp3
2016 05-20-clariah-wp3
 
2016 05-20-clariah-wp2
2016 05-20-clariah-wp22016 05-20-clariah-wp2
2016 05-20-clariah-wp2
 
2016 05-20-clariah-wp5
2016 05-20-clariah-wp52016 05-20-clariah-wp5
2016 05-20-clariah-wp5
 
MTAS Henny Brugman
MTAS Henny BrugmanMTAS Henny Brugman
MTAS Henny Brugman
 
LREC Ton vd Wouden
LREC Ton vd WoudenLREC Ton vd Wouden
LREC Ton vd Wouden
 
Paqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan OdijkPaqu Gertjan van Noord en Jan Odijk
Paqu Gertjan van Noord en Jan Odijk
 
Open sonar martinreynaert
Open sonar martinreynaertOpen sonar martinreynaert
Open sonar martinreynaert
 
Struc data Auke Rijpma
Struc data Auke RijpmaStruc data Auke Rijpma
Struc data Auke Rijpma
 
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek VossenDiachronous conceptuallexicons Marieke van Erp / Piek Vossen
Diachronous conceptuallexicons Marieke van Erp / Piek Vossen
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
 
Athena richard zijdeman
Athena richard zijdemanAthena richard zijdeman
Athena richard zijdeman
 
Struc data aukerijpma
Struc data aukerijpmaStruc data aukerijpma
Struc data aukerijpma
 

Recently uploaded

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Clarin nl odijk-final_event_2015-03-13

  • 1. CLARIN-NL Results and Evaluation Jan Odijk CLARIN-NL Final Event Hilversum, 2015-03-13 1
  • 2. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 2
  • 3. Overview • Infrastructure Core CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 3
  • 4. Infrastructure Core • 5 CLARIN Centres (‘Type B Centres’) 1. MPI 2. Meertens Institute 3. INL 4. Huygens ING 5. DANS • 3 CLARIN Data Providers (‘Type D Centres’) 1. National Library (KB) 2. Utrecht University Library 3. Netherlands Institute for Sound and Vision 4
  • 5. Infrastructure Core • CLARIN Centres – Have set up a proper repository system • So resources can be stored there – Have their CMDI-metadata harvestable • So resources are visible to others – Support for persistent identifiers (PIDs) • So links to resources are ‘never’ broken – Long-term archiving solution in place • So resources will not get lost – Provisions for federated identity management • So you can login with your own institute account (single sign-on) – Have acquired the Data Seal of Approval • So the data repositories can be trusted and are sustainable 5
  • 6. Infrastructure Core • CLARIN Type A Centres in NL – Offers services for the whole CLARIN infrastructure – Mainly MPI, some Meertens (and UU) • Enables you to search for resources: – Harvesting of metadata , Virtual Language Observatory, Meertens Metadata Search (Meertens), CLARIN-NL Portal (UU) • Enables you to create metadata – CMDI registry, CMDI Profile editor, Metadata editor • Enables you to ensure semantic interoperability – ISOCAT, RELCAT, SchemaCat – CLAVAS, CLARIN Concept Registry (Meertens) – Transfer from MPI to other centres (in EU) on-going 6
  • 7. Overview • Infrastructure Core – CLARIN Centres Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 7
  • 8. Infrastructure Core • Metadata and Metadata Search – CMDI metadata created for all data dealt with in CLARIN- NL – Using flexible CMDI • If needed, with user defined profiles and components – Searching for data possible via the • VLO • Meertens Metadata Search • Some work done on metadata for software – Partially reflected in CLARIN-NL Portal – But not (yet) in CMDI records / VLO 8
  • 9. Infrastructure Core • Metadata and Metadata Search – CMDI `too flexible’ – Big variation in granularity – Hardly any requirements on obligatoriness of certain metadata elements • some crucial metadata elements are lacking • VLO – Gives access to over 800k metadata records – KB metadata are not included (> 1 million) – Many external origin with no control over the metadata – Limited search options via VLO • Search via VLO is not as easy as it should be • CLARIN-NL Portal improves this for NL resources • Will be taken up in CLARIAH 9
  • 10. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 10
  • 11. Infrastructure Core • Federated Content Search (FCS) – Search via a single interface in multiple, distributed, data • NL centres created ‘end points’ for selected resources – So they can participate in FCS • Development of search interface and aggregator – Different approaches NL v. DE – NL Development stopped, adopted DE approach – See CLARIN-D FCS Aggregator • So far, only string (keyword) search is possible • Will be taken up again in CLARIAH 11
  • 12. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 12
  • 13. Data Curation • By the CLARIN Data Curation Service (DCS) – E.g. LESLLA, dialect dictionaries, IPNV Interviews with veterans • Via open calls and closed calls – In many (small) projects • Recent examples: VALID, DSS, eBNM+ • Broad coverage of the humanities • Contributed significantly to user involvement 13
  • 14. Data Curation 14 Discipline Count Linguistics 16 History 9 Literary Studies 5 Culture Sciences 4 Communication & Media Studies 2 Religion Studies 2 Computational Linguistics 1 Philosophy 1 Political Sciences 1
  • 15. Data Curation 15 Linguistics Count Acquisition 5 Historical Linguistics 4 Syntax 4 Morpho-syntax 3 Discourse 2 Language Documentation 2 Lexicology 2 6 others with each 1
  • 16. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation Software Curation & Web Applications • Interoperability • What you can do • Education and Training • Conclusions 16
  • 17. Software Curation / Web Applications • Via open calls and closed calls In many (small) projects – Curation / upgrades of existing software • AVResearcherXL (QuaMerdes), SHEBANQ, ColTime and EXILSEA upgrades of ELAN, PaQu, Cornetto Interface, … – Creation of new software • DSS, eBNM+, RemBench, OpenSONAR, PICCL, AutoSearch, … – Broad coverage of the humanities – Contributed significantly to user involvement 17
  • 18. Software Curation / web applications 18 Discipline Count Linguistics 27 History 14 Literary Studies 5 Communication & Media Studies 4 Cultural Sciences 4 Political Sciences 4 Computational Linguistics 3 3 others with each 1-2
  • 19. Software Curation / web applications 19 Linguistics Count Syntax 13 Morpho-syntax 7 Historical linguistics 5 Lexicology 5 Dialectology 4 Sign Language 4 7 others with each 2
  • 20. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications  Interoperability • What you can do • Education and Training • Conclusions 20
  • 21. Interoperability • Interoperability – Do tools apply to data seamlessly? – Can data be combined seamlessly? – Can tools be combined seamlessly? – Does CLARIN support data in real-world formats? 21
  • 22. Interoperability • Syntactic Interoperability – FoLIA becoming a de facto standard format for linguistically annotated text corpora in the Netherlands • TTNWW, PICCL, VU-DNC, Nederlab, Basilex, … – CLAM de facto standard in NL for turning software into RESTful web services – But • there are also other important formats that must be supported (TEI, LASSY XML, …) • And still too many ad-hoc formats, often without explicit syntax and semantics 22
  • 23. Interoperability • Semantic Interoperability – Data Categories for metadata elements actually used (e.g. in the VLO) – Data Categories for many data (content) elements defined but hardly used yet – ISOCAT was a useful data category registry • But had many problems – Now replaced by the CLARIN Concept Registry • Solves some of ISOCAT’s problems but not all • Will be addressed in CLARIAH 23
  • 24. Interoperability • Support for real world formats – New research data do not come in standardized formats – But as mixtures of .doc, .docx, HTML, PDF, plain text, ePub, … – And multiple standard formats must be supported in CLARIN (e.g. both FoLIA and TEI) – Support for data conversions via the OpenConvert project – But more is needed • Will be addressed in CLARIAH 24
  • 25. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability What you can do • Education and Training • Conclusions 25
  • 26. What you can do • Find and select existing data – Virtual Language Observatory, Meertens Metadata Search, CLARIN-NL Portal • Create new data through OCR and orthographic normalisation – PICCL • Create metadata for new or existing data – CMDI Registry, CMDI profile editor, metadata editors (e.g. ARBIL), … 26
  • 27. What you can do • Make semantics of metadata and data explicit – ISOCAT, RELCAT, SchemaCAT • now replaced by CLARIN Concept Registry (CCR) – CLAVAS • Enrich data with various kinds of annotations – TTNWW • Orthographic normalisation, pos-tagging, lemmatisation, parsing, named entity recognition, …. – Adelheid, INPOLDER, PaQu, ColTime and EXILSEA extensions to ELAN • Upload enriched data to search applications – PaQu, AutoSearch 27
  • 28. What you can do • Search, browse in data and analyze (meta)data and query results – OpenSONAR, GrETEL, PaQu, MIMORE, FESLI, SHEBANQ, AutoSearch, … – Arthurian Fiction, NameScape, COBWWWEB, eBNM+, C- DSD, DSS, RemBench, Nederlab, … – Interviews, WIP, VK, Polimedia, CKCC, DSS, AVResearcherXL, … – DUELME, WFT-GTB, CORNETTO, … 28
  • 29. What you can do • Visualize data analyses – COAVA, FESLI, MIMORE, Gabmap, SHEBANQ, Nederlab, OpenSONAR, … – CKCC, MIGMAP, AVResearcherXL • Store new data safely at a CLARIN Centre – All 5 centres have the Data Seal of Approval – 4 centres are certified CLARIN Centres 29
  • 30. Invitation • Use (elements from) the CLARIN infrastructure • Join user groups of specific services • Provide feedback so that we can further improve CLARIN • So that you can improve your research 30
  • 31. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do  Education and Training • Conclusions 31
  • 32. Education & Training • How do you learn to use these tools? – Courses / tutorials regularly organized – LOT summer / winter school courses – Demonstration scenarios and/or screen casts • E.g. for Gabmap GrETEL OpenSONAR – Educational modules via the portal: • https://dev.clarin.nl/node/CLARIN%20Educational%20Packages – Helpdesk: helpdesk@clarin.nl 32
  • 33. Education & Training • Do you want to know more? – Visit the CLARIN-NL portal • http://portal.clarin.nl – View the CLARIN-NL movies • http://www.clarin.nl/node/403 – Visit the demonstrations today – Ask me (or others) today 33
  • 34. Overview • Infrastructure Core – CLARIN Centres – Metadata and Searching for data – Federated Content Search • Resource Curation – Data Curation – Software Curation & Web Applications • Interoperability • What you can do • Education and Training  Conclusions 34
  • 35. Conclusions (1) • CLARIN is starting to provide the data, facilities and services to carry out humanities research supported by large amounts of data and tools • With easy interfaces and easy search options (no technical background needed) • Some training in using the tools is needed – To use the possibilities optimally – To understand the limitations of the data and the tools – Educational modules for selected functionality are available – Tutorials / trainings will continue to be regularly organized 35
  • 36. Conclusions (2) • But there is still a lot to do – Extensions of and improvements in metadata – Improvements of VLO – Improved functionality for most tools • Need / desire found b y actual use of the tools – Extend and improve search options for individual resources – Create options of searching across different resources of the same type – Improved interoperability 36
  • 37. Conclusions(3) • A successor project is needed! • CLARIAH www.clariah.nl • Proposal approved June 1, 2014 • Started Jan 1st, 2015 • Kick-off this afternoon 37
  • 38. THANKS FOR YOUR ATTENTION! 38