1. www.isocat.org
What do cats have to do with
explicit semantics?
Menzo Windhouwer Ineke Schuurman
MPI for Psycholinguistics KU Leuven & Utrecht University
menzo.windhouwer@mpi.nl ineke@ccl.kuleuven.be
2. www.isocat.org
TTNWW and ISOcat
• TTNWW: TST Tools voor het Nederlands als
Web services in een Workflow
• CLARIN-NL and VL pilot project
• Goal: to enable researchers in the humanties
to use our tools and resources in an easy way,
even when a whole series of tools and
resources is involved.
20 January 2012 CLIN22 - TTNWW Project 2
3. www.isocat.org
TTNWW and ISOcat
• Issues when making use of such a ‘chain’:
– Is the meaning of notion X in resource/tool A the
same as that in resource/tool B ?
– Is the meaning of notion X in resource/tool A and
that of Y in resource/tool B the same?
– Or, if not the same, are they related? If so, how?
= ISOcat and friends to the rescue !
20 January 2012 CLIN22 - TTNWW Project 3
4. www.isocat.org
Explicit semantics
• Language resources are valuable assets
– store them in an archive to assure persistency!
– later generations can research material that only now
can still be collected
• Problem: used terminology might ‘rot’
– terms get a (slightly) different meaning over (long)
periods of time
– later generations need to know the meaning of today
• Solution: make semantics explicit
20 January 2012 CLIN22 - TTNWW Project 4
5. www.isocat.org
The ISOcat Data Category Registry
http://www.isocat.org/
• An ISOcat data category is “an elementary
descriptor in a linguistic structure or an
annotation scheme” (ISO 12620:2009)
• ISOcat data categories have unique and
persistent identifiers, which can be resolved
over the web
http://www.isocat.org/datcat/DC-78
20 January 2012 CLIN22 - TTNWW Project 5
6. www.isocat.org
Annotate all elements in a linguistic resource
/lexicon/
/language/ /alphabet/ /entry/
/japanese/ /ipa/ /lemma/
/writtenForm/
20 January 2012 CLIN22 - TTNWW Project 6
7. www.isocat.org
Sharing structure
• Using ISOcat data category references
specifications of elementary descriptors can
be shared between structures
• How to share (annotated) structures?
• A companion registry for ISOcat is under
development: SCHEMAcat
• This registry should persistently store any kind
of schema, e.g., XML schemata, EBNF
grammars
20 January 2012 CLIN22 - TTNWW Project 7
9. www.isocat.org
Sharing relations
• Among data categories and (other) concepts
ontological relationships can be defined
• These relationships allow crosswalks between
various resource models
– discover related resources which use (different levels
of) semantically close data categories
• RELcat is a companion registry which will allow
storing (and sharing) a linguists individual view on
these relationships
http://lux13.mpi.nl/relcat/ (alpha)
20 January 2012 CLIN22 - TTNWW Project 9
10. www.isocat.org
Semantic network
Linguistic resource (schema) Linguistic knowledge base
Data categories
Containers
Concepts
Relation
Schema Registry - SCHEMAcat
Data Category Registry - ISOcat Concept Registry Relation Registry - RELcat
20 January 2012 CLIN22 - TTNWW Project 10
11. www.isocat.org
Conclusion
• CLARIN(-NL/-VL), including TTNWW, is working
towards a set of registries that enable the
community to collaboratively make semantics
explicit by:
– sharing elementary descriptors: data categories
• persistently
– sharing structure: schemata
• persistently
– sharing ontological relations
• individual world views
20 January 2012 CLIN22 - TTNWW Project 11
12. www.isocat.org
What do cats have to do with explicit semantics?
20 January 2012 CLIN22 - TTNWW Project 12
13. www.isocat.org
Thank you for your attention!
Visit
www.isocat.org
Questions?
www.isocat.org/forum/
isocat@mpi.nl
20 January 2012 CLIN22 - TTNWW Project 13
Editor's Notes
MENZO: Dit neem /animacy/ van Sue Ellen als voorbeeld. Misschien is er een die dichter tegen CGN aanligt.
MENZO: Misschien een voorbeeld wat dichter bij CGN ligt, i.e., een resource dat ook een CGN tag bevat?
ISOcats are there to stay ... and hopefully somewhere soon at least some of them will stay unmodified for decades