Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Presentation on the CLARIAH-techday about the possibility to automatically transform CMDI-files into their RDF-equivalence

Published in: Software
  • Be the first to comment

  • Be the first to like this


  1. 1. 1 Daan Broeder Menzo Windhouwer Meertens Instituut
  2. 2. 2 • Create an interoperable domain of Language Resources (LR) – Interoperable formats for LR content – Persistent identification (and citation) of LRs – Use of SAML based AAI for access to LRs – Use of the Component Metadata Infrastructure (CMDI) for describing LRs
  3. 3. 3 • Created as a response to a fragmented situation of LR metadata • Flexible – Not a single schema, but supports different metadata schema – Different schema for different situations – Semantic Interoperability via linking to semantic registries • Community driven – communities can model their own metadata schema – know their data and can create the right schema – know the right terminology • Sharing – Concepts, Terminology, Vocabularies • CLARIN Concept Registry for linguistic concepts, • ISO 368 and other relevant vocabularies • CLAVAS for organisation names – Components & profiles via the CLARIN metadata component registry
  4. 4. 4 • A Component groups together metadata Elements, which naturally belong together to describe a property of the resources – The Location where a SpeechRecording took place – The Location of an Actor – A Location is described by an address a/o region a/o country a/o continent • Components can be nested – The Language a specific Actor speaks – An Actor who takes part in a SpeechRecording for a specific Project • A Profile is a specific collection of Components for a specific type of resources, e.g., speech recordings SpeechRecordingP ActorC LocationC - addressE - regionE - countryE - continentE LocationC ProjectC LanguageC LanguageC Technical MetadataC
  5. 5. 5 OAI-PMH Provider OAI-PMH Harvester Local metadata repository Joint metadata repository metadata modeler metadata user metadata creator component registry & editor metadata editor metadata curator metadata curator metadata catalogue Relation Registry search & semantic mapping Resources Concept Registry
  6. 6. 6 • Started in 2010, version 1.2 released in 2016 supporting remote vocabularies • Actively supported by CLARIN ERIC and several national CLARIN consortia • Many supporting tools: – VLO, COMEDI, ARBIL, CMDI maker, Virtual Collection Registry … • Link to the Linked (open) Data world: CMDI2RDF CMDI LODCMDI2RDF
  7. 7. 7 • Started as a 2014 CLARIN NL project by TLA/MPI and DANS • Now a service supported by CLARIAH WP2 (X11.400) • Linking also to other ‘linguistic’ LoD information sources: – WALS for linguistic typology information – CLAVAS organization names – DBpedia (currently only used as glue) • Automatic synchronization CMDI metadata • Simplification of the RDFs CMDI model
  8. 8. 8 • CMD is classic W3C schema constrained XML • To map a CMD record to RDF we need – A mapping for the basic component model to RDFS • Basic classes and properties to represent profiles, components, elements, attributes and their relationships and values – A mapping for a specific profile or component to RDFS • A specific subclass or subproperty of the basic component model – A mapping for specific metadata records to RDF instances of RDFS • Instances of profile or component – Additionaly there is a generic CMD envelop that is mapped using common LOD vocabularies
  9. 9. 9  Basic CMD model is described by ISO/DIS 24622-1  1st part of ISO TC 37 SC 4 3 CMD standards family  Natural mapping to RDF would be:  Profiles/components to RDF Classes  Elements to RDF Properties  Complication  CLARIN’s CMDI allows attributes on both Components and Elements  So elements have to be RDF Classes as well
  10. 10. 10 • Nevertheless introduces extra hierarchy • CMDI is already a hierarchical metadata schema • Human readability decreases • Other solutions welcome! R 14 Age <Description URI= …. > <Age>14</Age> … </Person <Description…. > <Age status=‘U’>14</Age> … </Description> R Age 14 U Simplified example status
  11. 11. 11 OAI harvester CLARIN joint metadata domain CMD2RDF • conversion • enrichment Virtuoso caching CMD-RDF • SPARQL • REST • browse (L)L(O)D cloud Component Registry CLAVAS WALS Technology: • Virtuoso RDF store • Elda as browser • Tomcat as application server • Conversion pipeline in Java • Core transforms in XSLT • All source code on GitHub, • Docker build file & images available
  12. 12. 12
  13. 13. 13 • Offers LoD for different LR metadata infrastructures – LRE Map (LREC) – META-SHARE – CLARIN – DataHub (linguistic part) • However – Wrt. CLARIN only data with DC profiles • Just a small part of CLARIN – Seems partly based on static old data dumps
  14. 14. 14 • Goals: – Find metadata type of information about LRs in LD format – Translate that into a ‘suitable’ CMDI profile based metadata record • Is there such LD that is not already available direct in another format: OLAC, CLARIN, DC, META-SHARE – If so, useful to have this metadata in the CLARIN VLO metadata catalogue – Humanities data archives will have mostly DC, (inventory available from different projects: e.g. DASISH) and frequently offer LD – Easier ways exist to translate DC into CMDI (e.g. the CMDI DC profile) – But LD can be a pivot set for many such translations • Still in exploratory phase – Would like to use a general strategy, – Its very labor intensive to craft specific transformations for every LD set.
  15. 15. 15 • Useful for CLARIN? – Enriching existing CMDI metadata and recycling them – Relations to sources already known as: • WALS, DBpedia, CLAVAS, GlotoLog, … • Relations to CLARIAH LD sources ? – Enable the VLO (or an alternative browser) for visualizing this information – Increasing metadata quality: • Use CLAVAS to repair errors • Include preferred labels – Some CMDI adaptations required • Foreign namespace support in CMDI payload A VLO B C RDF2CMD CLARIN CENTRES CLARIAH? Enriched CMDI CMDI DPpedia Glotolog RDFstore
  16. 16. 16