www.isocat.org

ISOcat and RELcat:
2 cooperating Semantic Registries

Menzo Windhouwer
menzo.windhouwer@dans.knaw.nl
The Language Archive – DANS

Ineke Schuurman
ineke@ccl.kuleuven.be
KU Leuven, CLARIN-NL – Utrecht University
17 January 2014

CLIN 24

1
www.isocat.org

Outline

• The need for explicit semantics
– ISOcat

• Mapping issues
– Languages, theoretical frameworks
– Granularity levels
– RELcat

• CGN case study
• Conclusions and future work
17 January 2014

CLIN 24

2
www.isocat.org

Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{
NOTION tdn:GrammaticalDistinctions
LABEL "Grammatical distinctions for nouns."
GROUPS {
NOTION tdn:AgentNouns
LABEL "Agent nouns."
DESCRIPTION "Nouns can function as the agent of a clause."
LINK TO CONCEPT agentRole
GROUPS {
NOTION tdn:v098_plusAffix
LABEL "Agent nouns formed by verb stem plus affix."
LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix)
DESCRIPTION
<p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p>
NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX"
IS FIELD v098;
...

Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;
17 January 2014
CLIN 24
3
also this not a TDN punchcard
www.isocat.org

17 January 2014

DOBES corpora

CLIN 24

4
www.isocat.org

ISOcat

• An open Data Category/Concept Registry where
everyone can
– find and select data categories/concepts
– create new data categories/concepts
– share data categories/concepts

• Each data category/concept has a Persistent
Identifier which can be embedded in a resource
(schema) to make the intended semantics (more)
explicit
17 January 2014

CLIN 24

5
www.isocat.org

Mapping issues

• Interesting resources for a specific research
question might
– use very different theoretical frameworks, which
might share few/none data categories/concepts
– use more coarse or finer grained data
categories/concepts

• How to overcome these differences by
mapping data categories/concepts to each
other?
17 January 2014

CLIN 24

6
www.isocat.org

Some examples

• definite article (PoS)
– EN: 1 (-)
– FR: 2 (masc, fem)
– NL: 2 (neuter, non-neuter)
– DE: 3 (masc, fem, neuter)

Dutch ‘non-neuter’ , for example, should be
related to ‘masc’ and ‘fem’
17 January 2014

CLIN 24

7
www.isocat.org

Some examples

• Indirect object (syntax)
– EN: indirect object
– NL:
• meewerkend voorwerp (1), or
• meewerkend voorwerp (2) plus belanghebbend
voorwerp
– All translated as ‘indirect object’

=> 3 definitions of ‘indirect object’, relations are
to be shown !
17 January 2014

CLIN 24

8
www.isocat.org

Some examples

• Event (semantics)
– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other theories (Kamp & Reyle etc): eventuality,
two subtypes: ‘event’ and ‘state’

Concepts ‘eventuality’, ‘event’ and ‘state’ are to
be related
17 January 2014

CLIN 24

9
www.isocat.org

ISOcat internal issues

Data categories that are almost the same,
apart from type, profile, language, …
Currently we insert a new DC. But note that the
original one and the new one should be
marked as having a same-as relation

17 January 2014

CLIN 24

10
www.isocat.org

RELcat

• A Relation Registry (under construction) to store
–
–
–
–

(almost) same-as relationships
subsumption relationships (isSuperClassOf, isSubClassOf)
mereology relationships (isPartOf, hasPart)
…

between data categories/concepts
• The focus is on informal and possibly partial
ontologies to be used for resource discovery
• Based on RDF triples
17 January 2014

CLIN 24

11
www.isocat.org

CGN case study

• Atomic building blocks of CGN tags are
defined in ISOcat (still private)
• The EBNF schema of a CGN tag is stored in
SCHEMAcat
• The subsumption relations in the value
domains are stored in RELcat
• (almost) same-as relationships with other data
categories/concepts are also stored in RELcat
17 January 2014

CLIN 24

12
www.isocat.org

CGN granularity mappings

• How to deal with (almost) same-as
relationships that involve more then one
atomic CGN data category/concept?
– Example: N(SOORT) = Common Noun

• Based on the CGN EBNF this involves the
following slots of the /CGN tag/
– /PoS/ = /N/
– /NTYPE/ = /SOORT/

• How to express this in RDF?
17 January 2014

CLIN 24

13
www.isocat.org

RELcat RDF mapping

• Data categories/concepts can function as
subjects and objects in an RDF triple
• The predicate of an RDF triple is a RELcat
relationship type
• Alternative: complex data categories as
properties

17 January 2014

CLIN 24

14
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA

sameAs

Common Noun
17 January 2014

CLIN 24

15
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart

hasPart

PoS

has more parts
NTYPE

has more
potential
values

has more
potential
values
sameAs

hasPotentialValue

N
17 January 2014

Common Noun
CLIN 24

hasPotentialValue

SOORT
16
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart
hasPart

PoS
has more
potential
values

hasPart
hasPart

isA
hasValue

hasPotentialValue

17 January 2014

NTYPE

has more
potential
values

isA
sameAs

isA
N

has more parts

hasValue

hasPotentialValue

isA
Common Noun
CLIN 24

SOORT
17
www.isocat.org

N(SOORT) = Common Noun
CGN tag

isA
hasPart
hasPart

PoS
has more
potential
values

hasPart
hasPart

isA
hasValue

hasPotentialValue

17 January 2014

NTYPE

has more
potential
values

isA
sameAs

isA
N

has more parts

hasValue

hasPotentialValue

isA
Common Noun
CLIN 24

SOORT
18
www.isocat.org

Cooperation between
ISOcat and RELcat

• ISOcat: value domains of closed data
categories
– RELcat: hasPotentialValue (new relationship type)

• ISOcat: is-a relations between simple data
categories
– RELcat: subsumption relations

• SCHEMAcat: part-of relationships
– RELcat: mereology relationships
17 January 2014

CLIN 24

19
www.isocat.org

Conclusions and future work

• Simple mappings are easy
• Complex mapping get easily fairly complex
– UI support?
– DSL support?
– Alternative RDF mapping?

• User front-end for RELcat
– Integration of RELcat and ISOcat?
17 January 2014

CLIN 24

20
www.isocat.org

Other examples

• “JJR” -> “POS=adjective & degree=comparative”
• “Transitive” -> “thetavp=vp120 & synvps=[synNP] &
caseAssigner=True”
• “VVIMP” -> “POS= verb & main verb & mood=imperative”

17 January 2014

CLIN 24

21

ISOcat and RELcat, two cooperating semantic registries

  • 1.
    www.isocat.org ISOcat and RELcat: 2cooperating Semantic Registries Menzo Windhouwer menzo.windhouwer@dans.knaw.nl The Language Archive – DANS Ineke Schuurman ineke@ccl.kuleuven.be KU Leuven, CLARIN-NL – Utrecht University 17 January 2014 CLIN 24 1
  • 2.
    www.isocat.org Outline • The needfor explicit semantics – ISOcat • Mapping issues – Languages, theoretical frameworks – Granularity levels – RELcat • CGN case study • Conclusions and future work 17 January 2014 CLIN 24 2
  • 3.
    www.isocat.org Typological Database Nijmegen TOPNOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098; ... Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS; 17 January 2014 CLIN 24 3 also this not a TDN punchcard
  • 4.
  • 5.
    www.isocat.org ISOcat • An openData Category/Concept Registry where everyone can – find and select data categories/concepts – create new data categories/concepts – share data categories/concepts • Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit 17 January 2014 CLIN 24 5
  • 6.
    www.isocat.org Mapping issues • Interestingresources for a specific research question might – use very different theoretical frameworks, which might share few/none data categories/concepts – use more coarse or finer grained data categories/concepts • How to overcome these differences by mapping data categories/concepts to each other? 17 January 2014 CLIN 24 6
  • 7.
    www.isocat.org Some examples • definitearticle (PoS) – EN: 1 (-) – FR: 2 (masc, fem) – NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter) Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’ 17 January 2014 CLIN 24 7
  • 8.
    www.isocat.org Some examples • Indirectobject (syntax) – EN: indirect object – NL: • meewerkend voorwerp (1), or • meewerkend voorwerp (2) plus belanghebbend voorwerp – All translated as ‘indirect object’ => 3 definitions of ‘indirect object’, relations are to be shown ! 17 January 2014 CLIN 24 8
  • 9.
    www.isocat.org Some examples • Event(semantics) – ISO-TimeML: event and state, where ‘state’ is a type of event – Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’ Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related 17 January 2014 CLIN 24 9
  • 10.
    www.isocat.org ISOcat internal issues Datacategories that are almost the same, apart from type, profile, language, … Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation 17 January 2014 CLIN 24 10
  • 11.
    www.isocat.org RELcat • A RelationRegistry (under construction) to store – – – – (almost) same-as relationships subsumption relationships (isSuperClassOf, isSubClassOf) mereology relationships (isPartOf, hasPart) … between data categories/concepts • The focus is on informal and possibly partial ontologies to be used for resource discovery • Based on RDF triples 17 January 2014 CLIN 24 11
  • 12.
    www.isocat.org CGN case study •Atomic building blocks of CGN tags are defined in ISOcat (still private) • The EBNF schema of a CGN tag is stored in SCHEMAcat • The subsumption relations in the value domains are stored in RELcat • (almost) same-as relationships with other data categories/concepts are also stored in RELcat 17 January 2014 CLIN 24 12
  • 13.
    www.isocat.org CGN granularity mappings •How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept? – Example: N(SOORT) = Common Noun • Based on the CGN EBNF this involves the following slots of the /CGN tag/ – /PoS/ = /N/ – /NTYPE/ = /SOORT/ • How to express this in RDF? 17 January 2014 CLIN 24 13
  • 14.
    www.isocat.org RELcat RDF mapping •Data categories/concepts can function as subjects and objects in an RDF triple • The predicate of an RDF triple is a RELcat relationship type • Alternative: complex data categories as properties 17 January 2014 CLIN 24 14
  • 15.
    www.isocat.org N(SOORT) = CommonNoun CGN tag isA sameAs Common Noun 17 January 2014 CLIN 24 15
  • 16.
    www.isocat.org N(SOORT) = CommonNoun CGN tag isA hasPart hasPart PoS has more parts NTYPE has more potential values has more potential values sameAs hasPotentialValue N 17 January 2014 Common Noun CLIN 24 hasPotentialValue SOORT 16
  • 17.
    www.isocat.org N(SOORT) = CommonNoun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 17
  • 18.
    www.isocat.org N(SOORT) = CommonNoun CGN tag isA hasPart hasPart PoS has more potential values hasPart hasPart isA hasValue hasPotentialValue 17 January 2014 NTYPE has more potential values isA sameAs isA N has more parts hasValue hasPotentialValue isA Common Noun CLIN 24 SOORT 18
  • 19.
    www.isocat.org Cooperation between ISOcat andRELcat • ISOcat: value domains of closed data categories – RELcat: hasPotentialValue (new relationship type) • ISOcat: is-a relations between simple data categories – RELcat: subsumption relations • SCHEMAcat: part-of relationships – RELcat: mereology relationships 17 January 2014 CLIN 24 19
  • 20.
    www.isocat.org Conclusions and futurework • Simple mappings are easy • Complex mapping get easily fairly complex – UI support? – DSL support? – Alternative RDF mapping? • User front-end for RELcat – Integration of RELcat and ISOcat? 17 January 2014 CLIN 24 20
  • 21.
    www.isocat.org Other examples • “JJR”-> “POS=adjective & degree=comparative” • “Transitive” -> “thetavp=vp120 & synvps=[synNP] & caseAssigner=True” • “VVIMP” -> “POS= verb & main verb & mood=imperative” 17 January 2014 CLIN 24 21