ISOcat and RELcat, two cooperating semantic registries
Jan. 20, 2014•0 likes
1 likes
Be the first to like this
Show More
•727 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Technology
Education
M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.
ISOcat and RELcat, two cooperating semantic registries
www.isocat.org
ISOcat and RELcat:
2 cooperating Semantic Registries
Menzo Windhouwer
menzo.windhouwer@dans.knaw.nl
The Language Archive – DANS
Ineke Schuurman
ineke@ccl.kuleuven.be
KU Leuven, CLARIN-NL – Utrecht University
17 January 2014
CLIN 24
1
www.isocat.org
Outline
• The need for explicit semantics
– ISOcat
• Mapping issues
– Languages, theoretical frameworks
– Granularity levels
– RELcat
• CGN case study
• Conclusions and future work
17 January 2014
CLIN 24
2
www.isocat.org
Typological Database Nijmegen
TOP NOTION tds:Noun GROUPS{
NOTION tdn:GrammaticalDistinctions
LABEL "Grammatical distinctions for nouns."
GROUPS {
NOTION tdn:AgentNouns
LABEL "Agent nouns."
DESCRIPTION "Nouns can function as the agent of a clause."
LINK TO CONCEPT agentRole
GROUPS {
NOTION tdn:v098_plusAffix
LABEL "Agent nouns formed by verb stem plus affix."
LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix)
DESCRIPTION
<p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p>
NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX"
IS FIELD v098;
...
Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;
17 January 2014
CLIN 24
3
also this not a TDN punchcard
www.isocat.org
ISOcat
• An open Data Category/Concept Registry where
everyone can
– find and select data categories/concepts
– create new data categories/concepts
– share data categories/concepts
• Each data category/concept has a Persistent
Identifier which can be embedded in a resource
(schema) to make the intended semantics (more)
explicit
17 January 2014
CLIN 24
5
www.isocat.org
Mapping issues
• Interesting resources for a specific research
question might
– use very different theoretical frameworks, which
might share few/none data categories/concepts
– use more coarse or finer grained data
categories/concepts
• How to overcome these differences by
mapping data categories/concepts to each
other?
17 January 2014
CLIN 24
6
www.isocat.org
Some examples
• definite article (PoS)
– EN: 1 (-)
– FR: 2 (masc, fem)
– NL: 2 (neuter, non-neuter)
– DE: 3 (masc, fem, neuter)
Dutch ‘non-neuter’ , for example, should be
related to ‘masc’ and ‘fem’
17 January 2014
CLIN 24
7
www.isocat.org
Some examples
• Indirect object (syntax)
– EN: indirect object
– NL:
• meewerkend voorwerp (1), or
• meewerkend voorwerp (2) plus belanghebbend
voorwerp
– All translated as ‘indirect object’
=> 3 definitions of ‘indirect object’, relations are
to be shown !
17 January 2014
CLIN 24
8
www.isocat.org
Some examples
• Event (semantics)
– ISO-TimeML: event and state, where ‘state’ is a
type of event
– Other theories (Kamp & Reyle etc): eventuality,
two subtypes: ‘event’ and ‘state’
Concepts ‘eventuality’, ‘event’ and ‘state’ are to
be related
17 January 2014
CLIN 24
9
www.isocat.org
ISOcat internal issues
Data categories that are almost the same,
apart from type, profile, language, …
Currently we insert a new DC. But note that the
original one and the new one should be
marked as having a same-as relation
17 January 2014
CLIN 24
10
www.isocat.org
RELcat
• A Relation Registry (under construction) to store
–
–
–
–
(almost) same-as relationships
subsumption relationships (isSuperClassOf, isSubClassOf)
mereology relationships (isPartOf, hasPart)
…
between data categories/concepts
• The focus is on informal and possibly partial
ontologies to be used for resource discovery
• Based on RDF triples
17 January 2014
CLIN 24
11
www.isocat.org
CGN case study
• Atomic building blocks of CGN tags are
defined in ISOcat (still private)
• The EBNF schema of a CGN tag is stored in
SCHEMAcat
• The subsumption relations in the value
domains are stored in RELcat
• (almost) same-as relationships with other data
categories/concepts are also stored in RELcat
17 January 2014
CLIN 24
12
www.isocat.org
CGN granularity mappings
• How to deal with (almost) same-as
relationships that involve more then one
atomic CGN data category/concept?
– Example: N(SOORT) = Common Noun
• Based on the CGN EBNF this involves the
following slots of the /CGN tag/
– /PoS/ = /N/
– /NTYPE/ = /SOORT/
• How to express this in RDF?
17 January 2014
CLIN 24
13
www.isocat.org
RELcat RDF mapping
• Data categories/concepts can function as
subjects and objects in an RDF triple
• The predicate of an RDF triple is a RELcat
relationship type
• Alternative: complex data categories as
properties
17 January 2014
CLIN 24
14
www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more parts
NTYPE
has more
potential
values
has more
potential
values
sameAs
hasPotentialValue
N
17 January 2014
Common Noun
CLIN 24
hasPotentialValue
SOORT
16
www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more
potential
values
hasPart
hasPart
isA
hasValue
hasPotentialValue
17 January 2014
NTYPE
has more
potential
values
isA
sameAs
isA
N
has more parts
hasValue
hasPotentialValue
isA
Common Noun
CLIN 24
SOORT
17
www.isocat.org
N(SOORT) = Common Noun
CGN tag
isA
hasPart
hasPart
PoS
has more
potential
values
hasPart
hasPart
isA
hasValue
hasPotentialValue
17 January 2014
NTYPE
has more
potential
values
isA
sameAs
isA
N
has more parts
hasValue
hasPotentialValue
isA
Common Noun
CLIN 24
SOORT
18
www.isocat.org
Cooperation between
ISOcat and RELcat
• ISOcat: value domains of closed data
categories
– RELcat: hasPotentialValue (new relationship type)
• ISOcat: is-a relations between simple data
categories
– RELcat: subsumption relations
• SCHEMAcat: part-of relationships
– RELcat: mereology relationships
17 January 2014
CLIN 24
19
www.isocat.org
Conclusions and future work
• Simple mappings are easy
• Complex mapping get easily fairly complex
– UI support?
– DSL support?
– Alternative RDF mapping?
• User front-end for RELcat
– Integration of RELcat and ISOcat?
17 January 2014
CLIN 24
20