Advertisement

ISOcat and RELcat, two cooperating semantic registries

senior scientific software engineer at Data Archiving and Networked Services (DANS)
Jan. 20, 2014
Advertisement

More Related Content

Similar to ISOcat and RELcat, two cooperating semantic registries(20)

Advertisement

ISOcat and RELcat, two cooperating semantic registries

  1. www.isocat.org ISOcat and RELcat: 2 cooperating Semantic Registries Menzo Windhouwer menzo.windhouwer@dans.knaw.nl The Language Archive – DANS Ineke Schuurman ineke@ccl.kuleuven.be KU Leuven, CLARIN-NL – Utrecht University 17 January 2014 CLIN 24 1
  2. www.isocat.org Outline • The need for explicit semantics – ISOcat • Mapping issues – Languages, theoretical frameworks – Granularity levels – RELcat • CGN case study • Conclusions and future work 17 January 2014 CLIN 24 2
  3. www.isocat.org Typological Database Nijmegen TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098; ... Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS; 17 January 2014 CLIN 24 3 also this not a TDN punchcard
  4. www.isocat.org 17 January 2014 DOBES corpora CLIN 24 4
  5. www.isocat.org ISOcat • An open Data Category/Concept Registry where everyone can – find and select data categories/concepts – create new data categories/concepts – share data categories/concepts • Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit 17 January 2014 CLIN 24 5
  6. www.isocat.org Mapping issues • Interesting resources for a specific research question might – use very different theoretical frameworks, which might share few/none data categories/concepts – use more coarse or finer grained data categories/concepts • How to overcome these differences by mapping data categories/concepts to each other? 17 January 2014 CLIN 24 6
  7. www.isocat.org Some examples • definite article (PoS) – EN: 1 (-) – FR: 2 (masc, fem) – NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter) Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’ 17 January 2014 CLIN 24 7
  8. www.isocat.org Some examples • Indirect object (syntax) – EN: indirect object – NL: • meewerkend voorwerp (1), or • meewerkend voorwerp (2) plus belanghebbend voorwerp – All translated as ‘indirect object’ => 3 definitions of ‘indirect object’, relations are to be shown ! 17 January 2014 CLIN 24 8
  9. www.isocat.org Some examples • Event (semantics) – ISO-TimeML: event and state, where ‘state’ is a type of event – Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’ Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related 17 January 2014 CLIN 24 9
  10. www.isocat.org ISOcat internal issues Data categories that are almost the same, apart from type, profile, language, … Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation 17 January 2014 CLIN 24 10
  11. www.isocat.org RELcat • A Relation Registry (under construction) to store – – – – (almost) same-as relationships subsumption relationships (isSuperClassOf, isSubClassOf) mereology relationships (isPartOf, hasPart) … between data categories/concepts • The focus is on informal and possibly partial ontologies to be used for resource discovery • Based on RDF triples 17 January 2014 CLIN 24 11
  12. www.isocat.org CGN case study • Atomic building blocks of CGN tags are defined in ISOcat (still private) • The EBNF schema of a CGN tag is stored in SCHEMAcat • The subsumption relations in the value domains are stored in RELcat • (almost) same-as relationships with other data categories/concepts are also stored in RELcat 17 January 2014 CLIN 24 12
  13. www.isocat.org CGN granularity mappings • How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept? – Example: N(SOORT) = Common Noun • Based on the CGN EBNF this involves the following slots of the /CGN tag/ – /PoS/ = /N/ – /NTYPE/ = /SOORT/ • How to express this in RDF? 17 January 2014 CLIN 24 13
  14. www.isocat.org RELcat RDF mapping • Data categories/concepts can function as subjects and objects in an RDF triple • The predicate of an RDF triple is a RELcat relationship type • Alternative: complex data categories as properties 17 January 2014 CLIN 24 14
  15. www.isocat.org N(SOORT) = Common Noun CGN tag isA sameAs Common Noun 17 January 2014 CLIN 24 15
  16. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more parts NTYPE has more potential values has more potential values sameAs hasPotentialValue N 17 January 2014 Common Noun CLIN 24 hasPotentialValue SOORT 16
  17. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 17
  18. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 18
  19. www.isocat.org Cooperation between ISOcat and RELcat • ISOcat: value domains of closed data categories – RELcat: hasPotentialValue (new relationship type) • ISOcat: is-a relations between simple data categories – RELcat: subsumption relations • SCHEMAcat: part-of relationships – RELcat: mereology relationships 17 January 2014 CLIN 24 19
  20. www.isocat.org Conclusions and future work • Simple mappings are easy • Complex mapping get easily fairly complex – UI support? – DSL support? – Alternative RDF mapping? • User front-end for RELcat – Integration of RELcat and ISOcat? 17 January 2014 CLIN 24 20
  21. www.isocat.org Other examples • “JJR” -> “POS=adjective & degree=comparative” • “Transitive” -> “thetavp=vp120 & synvps=[synNP] & caseAssigner=True” • “VVIMP” -> “POS= verb & main verb & mood=imperative” 17 January 2014 CLIN 24 21
Advertisement