Successfully reported this slideshow.

CMDI2RDF

1

Share

Upcoming SlideShare
.Net and Rdf APIs
.Net and Rdf APIs
Loading in …3
×
1 of 16
1 of 16

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

CMDI2RDF

  1. 1. 1 Daan Broeder Menzo Windhouwer Meertens Instituut
  2. 2. 2 • Create an interoperable domain of Language Resources (LR) – Interoperable formats for LR content – Persistent identification (and citation) of LRs – Use of SAML based AAI for access to LRs – Use of the Component Metadata Infrastructure (CMDI) for describing LRs
  3. 3. 3 • Created as a response to a fragmented situation of LR metadata • Flexible – Not a single schema, but supports different metadata schema – Different schema for different situations – Semantic Interoperability via linking to semantic registries • Community driven – communities can model their own metadata schema – know their data and can create the right schema – know the right terminology • Sharing – Concepts, Terminology, Vocabularies • CLARIN Concept Registry for linguistic concepts, • ISO 368 and other relevant vocabularies • CLAVAS for organisation names – Components & profiles via the CLARIN metadata component registry
  4. 4. 4 • A Component groups together metadata Elements, which naturally belong together to describe a property of the resources – The Location where a SpeechRecording took place – The Location of an Actor – A Location is described by an address a/o region a/o country a/o continent • Components can be nested – The Language a specific Actor speaks – An Actor who takes part in a SpeechRecording for a specific Project • A Profile is a specific collection of Components for a specific type of resources, e.g., speech recordings SpeechRecordingP ActorC LocationC - addressE - regionE - countryE - continentE LocationC ProjectC LanguageC LanguageC Technical MetadataC
  5. 5. 5 OAI-PMH Provider OAI-PMH Harvester Local metadata repository Joint metadata repository metadata modeler metadata user metadata creator component registry & editor metadata editor metadata curator metadata curator metadata catalogue Relation Registry search & semantic mapping Resources Concept Registry
  6. 6. 6 • Started in 2010, version 1.2 released in 2016 supporting remote vocabularies • Actively supported by CLARIN ERIC and several national CLARIN consortia • Many supporting tools: – VLO, COMEDI, ARBIL, CMDI maker, Virtual Collection Registry … • Link to the Linked (open) Data world: CMDI2RDF CMDI LODCMDI2RDF
  7. 7. 7 • Started as a 2014 CLARIN NL project by TLA/MPI and DANS • Now a service supported by CLARIAH WP2 (X11.400) • Linking also to other ‘linguistic’ LoD information sources: – WALS for linguistic typology information – CLAVAS organization names – DBpedia (currently only used as glue) • Automatic synchronization CMDI metadata • Simplification of the RDFs CMDI model
  8. 8. 8 • CMD is classic W3C schema constrained XML • To map a CMD record to RDF we need – A mapping for the basic component model to RDFS • Basic classes and properties to represent profiles, components, elements, attributes and their relationships and values – A mapping for a specific profile or component to RDFS • A specific subclass or subproperty of the basic component model – A mapping for specific metadata records to RDF instances of RDFS • Instances of profile or component – Additionaly there is a generic CMD envelop that is mapped using common LOD vocabularies
  9. 9. 9  Basic CMD model is described by ISO/DIS 24622-1  1st part of ISO TC 37 SC 4 3 CMD standards family  Natural mapping to RDF would be:  Profiles/components to RDF Classes  Elements to RDF Properties  Complication  CLARIN’s CMDI allows attributes on both Components and Elements  So elements have to be RDF Classes as well
  10. 10. 10 • Nevertheless introduces extra hierarchy • CMDI is already a hierarchical metadata schema • Human readability decreases • Other solutions welcome! R 14 Age <Description URI= …. > <Age>14</Age> … </Person <Description…. > <Age status=‘U’>14</Age> … </Description> R Age 14 U Simplified example status
  11. 11. 11 OAI harvester CLARIN joint metadata domain CMD2RDF • conversion • enrichment Virtuoso caching CMD-RDF • SPARQL • REST • browse (L)L(O)D cloud Component Registry CLAVAS WALS Technology: • Virtuoso RDF store • Elda as browser • Tomcat as application server • Conversion pipeline in Java • Core transforms in XSLT • All source code on GitHub, • Docker build file & images available
  12. 12. 12
  13. 13. 13 • Offers LoD for different LR metadata infrastructures – LRE Map (LREC) – META-SHARE – CLARIN – DataHub (linguistic part) • However – Wrt. CLARIN only data with DC profiles • Just a small part of CLARIN – Seems partly based on static old data dumps
  14. 14. 14 • Goals: – Find metadata type of information about LRs in LD format – Translate that into a ‘suitable’ CMDI profile based metadata record • Is there such LD that is not already available direct in another format: OLAC, CLARIN, DC, META-SHARE – If so, useful to have this metadata in the CLARIN VLO metadata catalogue – Humanities data archives will have mostly DC, (inventory available from different projects: e.g. DASISH) and frequently offer LD – Easier ways exist to translate DC into CMDI (e.g. the CMDI DC profile) – But LD can be a pivot set for many such translations • Still in exploratory phase – Would like to use a general strategy, – Its very labor intensive to craft specific transformations for every LD set.
  15. 15. 15 • Useful for CLARIN? – Enriching existing CMDI metadata and recycling them – Relations to sources already known as: • WALS, DBpedia, CLAVAS, GlotoLog, … • Relations to CLARIAH LD sources ? – Enable the VLO (or an alternative browser) for visualizing this information – Increasing metadata quality: • Use CLAVAS to repair errors • Include preferred labels – Some CMDI adaptations required • Foreign namespace support in CMDI payload A VLO B C RDF2CMD CLARIN CENTRES CLARIAH? Enriched CMDI CMDI DPpedia Glotolog RDFstore
  16. 16. 16 http://cmdi2rdf.meertens.knaw.nl/cmd2rdf/

Editor's Notes

  • Virtuoso as a tripelstore
    Tomcat as application server
    Elda as browser

    Conversion pipeline in Java core transforms in XSLT

    all in a Docker package
    Code all on GitHub:
  • ×