What do cats have to do with explicit semantics?

Menzo Windhouwer
Menzo Windhouwersenior scientific software engineer at Data Archiving and Networked Services (DANS)
www.isocat.org




             What do cats have to do with
                 explicit semantics?

            Menzo Windhouwer                 Ineke Schuurman
             MPI for Psycholinguistics   KU Leuven & Utrecht University
            menzo.windhouwer@mpi.nl          ineke@ccl.kuleuven.be
www.isocat.org

                       TTNWW and ISOcat
     • TTNWW: TST Tools voor het Nederlands als
       Web services in een Workflow
     • CLARIN-NL and VL pilot project
     • Goal: to enable researchers in the humanties
       to use our tools and resources in an easy way,
       even when a whole series of tools and
       resources is involved.


     20 January 2012        CLIN22 - TTNWW Project      2
www.isocat.org

                       TTNWW and ISOcat
     • Issues when making use of such a ‘chain’:

           – Is the meaning of notion X in resource/tool A the
             same as that in resource/tool B ?
           – Is the meaning of notion X in resource/tool A and
             that of Y in resource/tool B the same?
           – Or, if not the same, are they related? If so, how?

     =            ISOcat and friends to the rescue !

     20 January 2012            CLIN22 - TTNWW Project            3
www.isocat.org

                       Explicit semantics
     • Language resources are valuable assets
           – store them in an archive to assure persistency!
           – later generations can research material that only now
             can still be collected
     • Problem: used terminology might ‘rot’
           – terms get a (slightly) different meaning over (long)
             periods of time
           – later generations need to know the meaning of today
     • Solution: make semantics explicit

     20 January 2012           CLIN22 - TTNWW Project                4
www.isocat.org

        The ISOcat Data Category Registry
                  http://www.isocat.org/
     • An ISOcat data category is “an elementary
       descriptor in a linguistic structure or an
       annotation scheme” (ISO 12620:2009)
     • ISOcat data categories have unique and
       persistent identifiers, which can be resolved
       over the web
                       http://www.isocat.org/datcat/DC-78

     20 January 2012             CLIN22 - TTNWW Project     5
www.isocat.org

         Annotate all elements in a linguistic resource

                                         /lexicon/


                       /language/       /alphabet/              /entry/



                       /japanese/          /ipa/               /lemma/



                                                             /writtenForm/




     20 January 2012                CLIN22 - TTNWW Project                   6
www.isocat.org

                       Sharing structure
     • Using ISOcat data category references
       specifications of elementary descriptors can
       be shared between structures
     • How to share (annotated) structures?
     • A companion registry for ISOcat is under
       development: SCHEMAcat
     • This registry should persistently store any kind
       of schema, e.g., XML schemata, EBNF
       grammars
     20 January 2012        CLIN22 - TTNWW Project        7
www.isocat.org
                                 Annotated CGN/DCOI grammar
tag = pos '(' feat* ')'
# @dcr:datcat ‘WW’ http://www.isocat.org/datacat/DC-1424
# @dcr:datcat ‘TW’ http://www.isocat.org/datacat/DC-1334
# @dcr:datcat ‘VG’ http://www.isocat.org/datacat/DC-1226
# @dcr:datcat ‘TSW’ http://www.isocat.org/datacat/DC-2717
pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW'
feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' |
       'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE'
NTYPE = 'soortnaam' | 'eigennaam'
GETAL = 'enkelvoud' | 'meervoud' | 'getal'
GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief'
GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig'
NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief'
POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij'
BUIGING = 'zonder' | 'met-e' | 'met-s'
GETAL-N = 'zonder-n' | 'meervoud-n'
WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘
# @dcr:datcat PVTIJD http://www.isocat.org/datacat/DC-1286
# @dcr:datcat ‘verleden’ http://www.isocat.ord/datacat/DC-1347
# @dcr:datcat ‘conjunctie’ http://www.isocat.ord/datacat/DC-1843
PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief'
PVAGR = 'enkelvoud' | 'meervoud' | 'met-t'
NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord'
VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald'
PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar'
PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o'
STATUS = 'vol' | 'gereduceerd' | 'nadruk'
NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv'
LWTYPE = 'bepaald' | 'onbepaald'
VZTYPE = 'initieel' | 'versmolten' | 'naal'
CONJTYPE =January 2012 | 'onderschikkend'
         20 'nevenschikkend'                                                 CLIN22 - TTNWW Project                                                      8
SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect'
www.isocat.org

                       Sharing relations
     • Among data categories and (other) concepts
       ontological relationships can be defined
     • These relationships allow crosswalks between
       various resource models
           – discover related resources which use (different levels
             of) semantically close data categories
     • RELcat is a companion registry which will allow
       storing (and sharing) a linguists individual view on
       these relationships
              http://lux13.mpi.nl/relcat/ (alpha)
     20 January 2012            CLIN22 - TTNWW Project                9
www.isocat.org

                          Semantic network
   Linguistic resource (schema)      Linguistic knowledge base
                                                                         Data categories
                                                                         Containers
                                                                         Concepts
                                                                                 Relation




      Schema Registry - SCHEMAcat




   Data Category Registry - ISOcat   Concept Registry            Relation Registry - RELcat
     20 January 2012                 CLIN22 - TTNWW Project                             10
www.isocat.org

                                   Conclusion
     • CLARIN(-NL/-VL), including TTNWW, is working
       towards a set of registries that enable the
       community to collaboratively make semantics
       explicit by:
           – sharing elementary descriptors: data categories
                  • persistently
           – sharing structure: schemata
                  • persistently
           – sharing ontological relations
                  • individual world views
     20 January 2012                CLIN22 - TTNWW Project     11
www.isocat.org



     What do cats have to do with explicit semantics?




     20 January 2012    CLIN22 - TTNWW Project      12
www.isocat.org




                       Thank you for your attention!

                                   Visit
                               www.isocat.org

                                 Questions?
                            www.isocat.org/forum/
                               isocat@mpi.nl



     20 January 2012              CLIN22 - TTNWW Project   13
1 of 13

Recommended

Znani i popularni by
Znani i popularniZnani i popularni
Znani i popularniGazeta Pomorska
641 views36 slides
On the way to a Relation Registry for ISOcat data categories by
On the way to a Relation Registry for ISOcat data categoriesOn the way to a Relation Registry for ISOcat data categories
On the way to a Relation Registry for ISOcat data categoriesMenzo Windhouwer
485 views27 slides
Narzekasz? by
Narzekasz?Narzekasz?
Narzekasz?Gazeta Pomorska
261 views29 slides
Nie Tylko Dla Kobiet by
Nie Tylko Dla KobietNie Tylko Dla Kobiet
Nie Tylko Dla KobietGazeta Pomorska
535 views21 slides
A CMD Core Model for CLARIN Web Services by
A CMD Core Model for CLARIN Web ServicesA CMD Core Model for CLARIN Web Services
A CMD Core Model for CLARIN Web ServicesMenzo Windhouwer
713 views23 slides
Za co jesteś wdzięczny? by
Za co jesteś wdzięczny?Za co jesteś wdzięczny?
Za co jesteś wdzięczny?Gazeta Pomorska
417 views33 slides

More Related Content

Viewers also liked

Sustainable operability: Keeping complex linguistic resources alive. by
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.Menzo Windhouwer
577 views32 slides
LDL 2012 - Linking to ISOcat Data Categories by
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data CategoriesMenzo Windhouwer
918 views23 slides
The ISO-DCR by
The ISO-DCRThe ISO-DCR
The ISO-DCRMenzo Windhouwer
419 views12 slides
Vip10750行程报价单诺维奇出发3天2晚0802010 by
Vip10750行程报价单诺维奇出发3天2晚0802010Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010guest55ae8d4
1K views6 slides
Definicje Alternatywne by
Definicje AlternatywneDefinicje Alternatywne
Definicje AlternatywneGazeta Pomorska
202 views46 slides
Królowie szos by
Królowie szosKrólowie szos
Królowie szosGazeta Pomorska
505 views35 slides

Viewers also liked(10)

Sustainable operability: Keeping complex linguistic resources alive. by Menzo Windhouwer
Sustainable operability: Keeping complex linguistic resources alive.Sustainable operability: Keeping complex linguistic resources alive.
Sustainable operability: Keeping complex linguistic resources alive.
Menzo Windhouwer577 views
LDL 2012 - Linking to ISOcat Data Categories by Menzo Windhouwer
LDL 2012 - Linking to ISOcat Data CategoriesLDL 2012 - Linking to ISOcat Data Categories
LDL 2012 - Linking to ISOcat Data Categories
Menzo Windhouwer918 views
Vip10750行程报价单诺维奇出发3天2晚0802010 by guest55ae8d4
Vip10750行程报价单诺维奇出发3天2晚0802010Vip10750行程报价单诺维奇出发3天2晚0802010
Vip10750行程报价单诺维奇出发3天2晚0802010
guest55ae8d41K views
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS... by Menzo Windhouwer
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Collaboratively Defining Widely Accepted Linguistic Data Categories in the IS...
Menzo Windhouwer736 views

Similar to What do cats have to do with explicit semantics?

Dariah vcc3 2505-2013_displaying by
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displayingMinel Jean-Luc
1K views23 slides
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE by
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORE
LOD2 Webinar Series Classification and Quality Analysis with DL Learner and ORELOD2 Creating Knowledge out of Interlinked Data
1.8K views32 slides
Unicum Dish2011 by
Unicum Dish2011Unicum Dish2011
Unicum Dish2011Leiden University Libraries
247 views20 slides
Lod2 by
Lod2Lod2
Lod2STI Innsbruck
562 views11 slides
Improving the Performance of the DL-Learner SPARQL Component for Semantic We... by
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...Sebastian Hellmann
914 views22 slides

Similar to What do cats have to do with explicit semantics?(20)

Dariah vcc3 2505-2013_displaying by Minel Jean-Luc
Dariah vcc3 2505-2013_displayingDariah vcc3 2505-2013_displaying
Dariah vcc3 2505-2013_displaying
Minel Jean-Luc1K views
Improving the Performance of the DL-Learner SPARQL Component for Semantic We... by Sebastian Hellmann
Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...Improving the Performance of the  DL-Learner SPARQL Component for Semantic We...
Improving the Performance of the DL-Learner SPARQL Component for Semantic We...
Sebastian Hellmann914 views
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work... by OpenAIRE
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE and the case of Irish Repositories, by Jochen Schirrwagen (RIAN Work...
OpenAIRE1.4K views
OpenAIRE and the Case of Irish Repositories by RIANIreland
OpenAIRE and the Case of Irish RepositoriesOpenAIRE and the Case of Irish Repositories
OpenAIRE and the Case of Irish Repositories
RIANIreland224 views
Linked Data at the OU - the story so far by Enrico Daga
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
Enrico Daga1.9K views
The Learning Registry: Social networking for open educational resources? by Lorna Campbell
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
Lorna Campbell941 views
Linked Open Data Visualization by Laura Po
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
Laura Po986 views
Linked data and semantic wikis by Sören Auer
Linked data and semantic wikisLinked data and semantic wikis
Linked data and semantic wikis
Sören Auer975 views
WIDOCO: A Wizard for Documenting Ontologies by dgarijo
WIDOCO: A Wizard for Documenting OntologiesWIDOCO: A Wizard for Documenting Ontologies
WIDOCO: A Wizard for Documenting Ontologies
dgarijo1.2K views
Navigation-induced Knowledge Engineering by Example by Sebastian Hellmann
 Navigation-induced Knowledge Engineering by Example Navigation-induced Knowledge Engineering by Example
Navigation-induced Knowledge Engineering by Example
Sebastian Hellmann1.4K views
Video game controlled vocabulary in wikidata by peterchanws
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidata
peterchanws128 views
Facilitating Data Curation: a Solution Developed in the Toxicology Domain by Christophe Debruyne
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Extending DCAM for Metadata Provenance by Kai Eckert
Extending DCAM for Metadata ProvenanceExtending DCAM for Metadata Provenance
Extending DCAM for Metadata Provenance
Kai Eckert1K views

What do cats have to do with explicit semantics?

  • 1. www.isocat.org What do cats have to do with explicit semantics? Menzo Windhouwer Ineke Schuurman MPI for Psycholinguistics KU Leuven & Utrecht University menzo.windhouwer@mpi.nl ineke@ccl.kuleuven.be
  • 2. www.isocat.org TTNWW and ISOcat • TTNWW: TST Tools voor het Nederlands als Web services in een Workflow • CLARIN-NL and VL pilot project • Goal: to enable researchers in the humanties to use our tools and resources in an easy way, even when a whole series of tools and resources is involved. 20 January 2012 CLIN22 - TTNWW Project 2
  • 3. www.isocat.org TTNWW and ISOcat • Issues when making use of such a ‘chain’: – Is the meaning of notion X in resource/tool A the same as that in resource/tool B ? – Is the meaning of notion X in resource/tool A and that of Y in resource/tool B the same? – Or, if not the same, are they related? If so, how? = ISOcat and friends to the rescue ! 20 January 2012 CLIN22 - TTNWW Project 3
  • 4. www.isocat.org Explicit semantics • Language resources are valuable assets – store them in an archive to assure persistency! – later generations can research material that only now can still be collected • Problem: used terminology might ‘rot’ – terms get a (slightly) different meaning over (long) periods of time – later generations need to know the meaning of today • Solution: make semantics explicit 20 January 2012 CLIN22 - TTNWW Project 4
  • 5. www.isocat.org The ISOcat Data Category Registry http://www.isocat.org/ • An ISOcat data category is “an elementary descriptor in a linguistic structure or an annotation scheme” (ISO 12620:2009) • ISOcat data categories have unique and persistent identifiers, which can be resolved over the web http://www.isocat.org/datcat/DC-78 20 January 2012 CLIN22 - TTNWW Project 5
  • 6. www.isocat.org Annotate all elements in a linguistic resource /lexicon/ /language/ /alphabet/ /entry/ /japanese/ /ipa/ /lemma/ /writtenForm/ 20 January 2012 CLIN22 - TTNWW Project 6
  • 7. www.isocat.org Sharing structure • Using ISOcat data category references specifications of elementary descriptors can be shared between structures • How to share (annotated) structures? • A companion registry for ISOcat is under development: SCHEMAcat • This registry should persistently store any kind of schema, e.g., XML schemata, EBNF grammars 20 January 2012 CLIN22 - TTNWW Project 7
  • 8. www.isocat.org Annotated CGN/DCOI grammar tag = pos '(' feat* ')' # @dcr:datcat ‘WW’ http://www.isocat.org/datacat/DC-1424 # @dcr:datcat ‘TW’ http://www.isocat.org/datacat/DC-1334 # @dcr:datcat ‘VG’ http://www.isocat.org/datacat/DC-1226 # @dcr:datcat ‘TSW’ http://www.isocat.org/datacat/DC-2717 pos = 'N' | ' ADJ' | 'WW' | 'TW' | 'VNW' | 'LID' | 'VZ' | 'VG' | 'BW' | 'TSW' feat = 'NTYPE' | 'GETAL' | 'GRAAD | 'GENUS | 'NAAMVAL' | 'POSITIE' | 'BUIGING | 'GETAL-N' | 'WVORM | 'PVTIJD | 'PVAGR' | 'NUMTYPE' | 'VWTYPE' | 'PDTYPE' | 'PERSOON' | 'STATUS' | 'NPAGR' | 'LWTYPE' | 'VZTYPE’ | 'CONJTYPE' | 'SPECTYPE' NTYPE = 'soortnaam' | 'eigennaam' GETAL = 'enkelvoud' | 'meervoud' | 'getal' GRAAD = 'basis' | 'comparatief' | 'superlatief' | 'diminutief' GENUS = 'genus' | 'zijdig' | 'masculien' | 'feminien' | 'onzijdig' NAAMVAL = 'standaard' | 'nominatief' | 'oblique' | 'bijzonder' | 'genitief' | 'datief' POSITIE = 'prenominaal' | 'nominaal' | 'postnominaal 'vrij' BUIGING = 'zonder' | 'met-e' | 'met-s' GETAL-N = 'zonder-n' | 'meervoud-n' WVORM = 'persoonsvorm' | 'buigbaar' | 'innitief' | 'onvdw' | 'voltdw‘ # @dcr:datcat PVTIJD http://www.isocat.org/datacat/DC-1286 # @dcr:datcat ‘verleden’ http://www.isocat.ord/datacat/DC-1347 # @dcr:datcat ‘conjunctie’ http://www.isocat.ord/datacat/DC-1843 PVTIJD = 'tegenwoordig' | 'verleden' | 'conjunctief' PVAGR = 'enkelvoud' | 'meervoud' | 'met-t' NUMTUPE = 'hoofdtelwoord' | 'rangtelwoord' VWTYPE = 'pr' | 'persoonlijk' | 'reexief' | 'reciprook' | 'bezittelijk' | 'vb' | 'vragend' | 'betrekkelijk' | 'exclamatief' | 'aanwijzend' | 'onbepaald' PDTYPE = 'pronomen' | 'adv-pronimen' | 'determiner' | 'gradeerbaar' PERSOON = 'persoon' | '1' | '2' | '2v' | '2b' | '3' | '3p' | '3' | '3v' | '3o' STATUS = 'vol' | 'gereduceerd' | 'nadruk' NPAGR = 'agr' | 'evon' | 'rest' | 'evz' | 'mv' | 'agr3' | 'evmo' | 'rest3' | 'evf' | 'mv' LWTYPE = 'bepaald' | 'onbepaald' VZTYPE = 'initieel' | 'versmolten' | 'naal' CONJTYPE =January 2012 | 'onderschikkend' 20 'nevenschikkend' CLIN22 - TTNWW Project 8 SPECTYPE = 'afgebroken' | 'onverstaanbaar' | 'vreemd' | 'deeleigen' | 'meta' | 'commentaar' | 'achtergrond' | 'afkorting' | 'symbool' | 'dialect'
  • 9. www.isocat.org Sharing relations • Among data categories and (other) concepts ontological relationships can be defined • These relationships allow crosswalks between various resource models – discover related resources which use (different levels of) semantically close data categories • RELcat is a companion registry which will allow storing (and sharing) a linguists individual view on these relationships http://lux13.mpi.nl/relcat/ (alpha) 20 January 2012 CLIN22 - TTNWW Project 9
  • 10. www.isocat.org Semantic network Linguistic resource (schema) Linguistic knowledge base Data categories Containers Concepts Relation Schema Registry - SCHEMAcat Data Category Registry - ISOcat Concept Registry Relation Registry - RELcat 20 January 2012 CLIN22 - TTNWW Project 10
  • 11. www.isocat.org Conclusion • CLARIN(-NL/-VL), including TTNWW, is working towards a set of registries that enable the community to collaboratively make semantics explicit by: – sharing elementary descriptors: data categories • persistently – sharing structure: schemata • persistently – sharing ontological relations • individual world views 20 January 2012 CLIN22 - TTNWW Project 11
  • 12. www.isocat.org What do cats have to do with explicit semantics? 20 January 2012 CLIN22 - TTNWW Project 12
  • 13. www.isocat.org Thank you for your attention! Visit www.isocat.org Questions? www.isocat.org/forum/ isocat@mpi.nl 20 January 2012 CLIN22 - TTNWW Project 13

Editor's Notes

  1. MENZO: Dit neem /animacy/ van Sue Ellen als voorbeeld. Misschien is er een die dichter tegen CGN aanligt.
  2. MENZO: Misschien een voorbeeld wat dichter bij CGN ligt, i.e., een resource dat ook een CGN tag bevat?
  3. ISOcats are there to stay ... and hopefully somewhere soon at least some of them will stay unmodified for decades 