Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
www.isocat.org

ISOcat and RELcat:
2 cooperating Semantic Registries

Menzo Windhouwer
menzo.windhouwer@dans.knaw.nl
The L...
www.isocat.org

Outline

• The need for explicit semantics
– ISOcat

• Mapping issues
– Languages, theoretical frameworks
...
www.isocat.org

Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{
NOTION tdn:GrammaticalDistinctions
LABEL "Gramma...
www.isocat.org

17 January 2014

DOBES corpora

CLIN 24

4
www.isocat.org

ISOcat

• An open Data Category/Concept Registry where
everyone can
– find and select data categories/conc...
www.isocat.org

Mapping issues

• Interesting resources for a specific research
question might
– use very different theore...
www.isocat.org

Some examples

• definite article (PoS)
– EN: 1 (-)
– FR: 2 (masc, fem)
– NL: 2 (neuter, non-neuter)
– DE:...
www.isocat.org

Some examples

• Indirect object (syntax)
– EN: indirect object
– NL:
• meewerkend voorwerp (1), or
• meew...
www.isocat.org

Some examples

• Event (semantics)
– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other...
www.isocat.org

ISOcat internal issues

Data categories that are almost the same,
apart from type, profile, language, …
C...
www.isocat.org

RELcat

• A Relation Registry (under construction) to store
–
–
–
–

(almost) same-as relationships
subsum...
www.isocat.org

CGN case study

• Atomic building blocks of CGN tags are
defined in ISOcat (still private)
• The EBNF sche...
www.isocat.org

CGN granularity mappings

• How to deal with (almost) same-as
relationships that involve more then one
ato...
www.isocat.org

RELcat RDF mapping

• Data categories/concepts can function as
subjects and objects in an RDF triple
• The...
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA

sameAs

Common Noun
17 January 2014

CLIN 24

15
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart

hasPart

PoS

has more parts
NTYPE

has more
potential
values...
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart
hasPart

PoS
has more
potential
values

hasPart
hasPart

isA
h...
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart
hasPart

PoS
has more
potential
values

hasPart
hasPart

isA
h...
www.isocat.org

Cooperation between
ISOcat and RELcat

• ISOcat: value domains of closed data
categories
– RELcat: hasPote...
www.isocat.org

Conclusions and future work

• Simple mappings are easy
• Complex mapping get easily fairly complex
– UI s...
www.isocat.org

Other examples

• “JJR” -> “POS=adjective & degree=comparative”
• “Transitive” -> “thetavp=vp120 & synvps=...
Upcoming SlideShare
Loading in …5
×

ISOcat and RELcat, two cooperating semantic registries

646 views

Published on

M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

ISOcat and RELcat, two cooperating semantic registries

  1. 1. www.isocat.org ISOcat and RELcat: 2 cooperating Semantic Registries Menzo Windhouwer menzo.windhouwer@dans.knaw.nl The Language Archive – DANS Ineke Schuurman ineke@ccl.kuleuven.be KU Leuven, CLARIN-NL – Utrecht University 17 January 2014 CLIN 24 1
  2. 2. www.isocat.org Outline • The need for explicit semantics – ISOcat • Mapping issues – Languages, theoretical frameworks – Granularity levels – RELcat • CGN case study • Conclusions and future work 17 January 2014 CLIN 24 2
  3. 3. www.isocat.org Typological Database Nijmegen TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098; ... Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS; 17 January 2014 CLIN 24 3 also this not a TDN punchcard
  4. 4. www.isocat.org 17 January 2014 DOBES corpora CLIN 24 4
  5. 5. www.isocat.org ISOcat • An open Data Category/Concept Registry where everyone can – find and select data categories/concepts – create new data categories/concepts – share data categories/concepts • Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit 17 January 2014 CLIN 24 5
  6. 6. www.isocat.org Mapping issues • Interesting resources for a specific research question might – use very different theoretical frameworks, which might share few/none data categories/concepts – use more coarse or finer grained data categories/concepts • How to overcome these differences by mapping data categories/concepts to each other? 17 January 2014 CLIN 24 6
  7. 7. www.isocat.org Some examples • definite article (PoS) – EN: 1 (-) – FR: 2 (masc, fem) – NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter) Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’ 17 January 2014 CLIN 24 7
  8. 8. www.isocat.org Some examples • Indirect object (syntax) – EN: indirect object – NL: • meewerkend voorwerp (1), or • meewerkend voorwerp (2) plus belanghebbend voorwerp – All translated as ‘indirect object’ => 3 definitions of ‘indirect object’, relations are to be shown ! 17 January 2014 CLIN 24 8
  9. 9. www.isocat.org Some examples • Event (semantics) – ISO-TimeML: event and state, where ‘state’ is a type of event – Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’ Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related 17 January 2014 CLIN 24 9
  10. 10. www.isocat.org ISOcat internal issues Data categories that are almost the same, apart from type, profile, language, … Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation 17 January 2014 CLIN 24 10
  11. 11. www.isocat.org RELcat • A Relation Registry (under construction) to store – – – – (almost) same-as relationships subsumption relationships (isSuperClassOf, isSubClassOf) mereology relationships (isPartOf, hasPart) … between data categories/concepts • The focus is on informal and possibly partial ontologies to be used for resource discovery • Based on RDF triples 17 January 2014 CLIN 24 11
  12. 12. www.isocat.org CGN case study • Atomic building blocks of CGN tags are defined in ISOcat (still private) • The EBNF schema of a CGN tag is stored in SCHEMAcat • The subsumption relations in the value domains are stored in RELcat • (almost) same-as relationships with other data categories/concepts are also stored in RELcat 17 January 2014 CLIN 24 12
  13. 13. www.isocat.org CGN granularity mappings • How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept? – Example: N(SOORT) = Common Noun • Based on the CGN EBNF this involves the following slots of the /CGN tag/ – /PoS/ = /N/ – /NTYPE/ = /SOORT/ • How to express this in RDF? 17 January 2014 CLIN 24 13
  14. 14. www.isocat.org RELcat RDF mapping • Data categories/concepts can function as subjects and objects in an RDF triple • The predicate of an RDF triple is a RELcat relationship type • Alternative: complex data categories as properties 17 January 2014 CLIN 24 14
  15. 15. www.isocat.org N(SOORT) = Common Noun CGN tag isA sameAs Common Noun 17 January 2014 CLIN 24 15
  16. 16. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more parts NTYPE has more potential values has more potential values sameAs hasPotentialValue N 17 January 2014 Common Noun CLIN 24 hasPotentialValue SOORT 16
  17. 17. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 17
  18. 18. www.isocat.org N(SOORT) = Common Noun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 18
  19. 19. www.isocat.org Cooperation between ISOcat and RELcat • ISOcat: value domains of closed data categories – RELcat: hasPotentialValue (new relationship type) • ISOcat: is-a relations between simple data categories – RELcat: subsumption relations • SCHEMAcat: part-of relationships – RELcat: mereology relationships 17 January 2014 CLIN 24 19
  20. 20. www.isocat.org Conclusions and future work • Simple mappings are easy • Complex mapping get easily fairly complex – UI support? – DSL support? – Alternative RDF mapping? • User front-end for RELcat – Integration of RELcat and ISOcat? 17 January 2014 CLIN 24 20
  21. 21. www.isocat.org Other examples • “JJR” -> “POS=adjective & degree=comparative” • “Transitive” -> “thetavp=vp120 & synvps=[synNP] & caseAssigner=True” • “VVIMP” -> “POS= verb & main verb & mood=imperative” 17 January 2014 CLIN 24 21

×