SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
2.
Outline<br />Introduction to ISOcat a ISO 12620:2009 compliant Data Category Registry (DCR)<br />ISOcat and the Lexical Markup Framework (LMF; ISO 24613:2008)<br />ISOcat and TEI (Dictionaries)<br />12 October 2011<br />2<br />TEI Lexical workshop - Würzburg, Germany<br />
3.
12 October 2011<br />3<br />ISO 12620:2009<br /><ul><li>Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources
5.
Replaces ISO 12620:1999, a hardcoded list of Data Categories, with a registry for (standardized) Data Categories</li></ul>TEI Lexical workshop - Würzburg, Germany<br />
6.
12 October 2011<br />4<br />What is a Data Category?<br /><ul><li>The result of the specification of a given data field
7.
A data category is an elementary descriptor in a linguistic structure or an annotation scheme.
25.
12 October 2011<br />6<br />What is a Data Category Registry?<br />www.isocat.org<br /><ul><li>A (coherent) set of Data Categories, in our case for linguistic resources
31.
ISOcat and LMF<br />§4.4 ISO 12620 Data Category Registry (DCR)<br />“The designers of an LMF conformant lexicon shall use data categories from the ISO 12620 Data Category Registry (DCR) located at www.isocat.org.”<br />§ 5.4 LMF data category selection procedures<br />Create a Data Category Selection<br />Add Data Categories to ISOcat if needed<br /><ul><li>Missing: how to refer to ISOcat Data Categories?</li></ul>12 October 2011<br />7<br />TEI Lexical workshop - Würzburg, Germany<br />
32.
Data Category identifiers are ambiguous<br />…<br /><LexicalEntry><br /> <feat att=“partOfSpeech” val=“commonNoun”/><br /> …<br />ISOcat contains two exact matches for “commonNoun” and one close match:<br />12 October 2011<br />8<br />TEI Lexical workshop - Würzburg, Germany<br />
33.
Why are identifiers ambiguous?<br />Several thematic domains can use the same name for a (slightly) different Data Category<br />This was already true in the predecessor of ISOcat SYNTAX (legacy)<br />There maybe multiple versions of the same Data Category<br />Due to semantic drift or rot the name can not just point to the latest version<br />Users can also create Data Categories with the same name<br />In the future even copy a Data Category to extends its conceptual domain<br /><ul><li>Identifier should have been renamed, e.g., to mnemonic</li></ul>12 October 2011<br />9<br />TEI Lexical workshop - Würzburg, Germany<br />
34.
ISOcat Data Category PIDs are unique<br />Each ISOcat Data Category (version) has an unique PID<br />http://www.isocat.org/datcat/DC-1256<br /><ul><li>/common noun/ by Gil Francopoulo</li></ul>ISO 12620:2009 Annex A provides a small vocabulary to annotate an XML document with Data Category PID references:<br /><feat<br />att=“partOfSpeech”<br />dcr:datcat=“http://www.isocat.org/datcat/DC-1345”<br />val=“commonNoun”<br />dcr:valueDatcat=“http://www.isocat.org/datcat/DC-1256”<br />/><br /><ul><li>Preferably annotate the schema of the resource</li></ul>12 October 2011<br />10<br />TEI Lexical workshop - Würzburg, Germany<br />
37.
TEI and ISOcat Data Category PIDs<br />Is TEI open to attributes from foreign namespaces?<br /><ul><li>dcr:* attributes can already be used</li></ul>Or can the dcr:* attributes be part of the global attribute list?<br /><ul><li>It would enable to annotate any TEI element, incl. Dictionary elements, with a Data Category reference
38.
The DCR data model now also includes container Data Categories and can thus also cover inner nodes
39.
Could also (partially?) be done by <equiv/> statements in the ODD files
40.
Scripts to do this (semi-)automatically have already been created</li></ul>Or can at least the TEI/ISO feature structure part accept dcr:* attributes?<br /><ul><li>Add a DCR specific attribute list?
41.
Would make the ISO TC 37 standards consistent ISO 24610-1, ISO 24613:2008 and ISO 12620:2009</li></ul> Could also be another TEI attribute that expresses equivalence with an external (URI) specification (like <equiv/> in ODD) and which isn’t as much bound to ISOcatas the dcr:* attributes imply<br />12 October 2011<br />13<br />TEI Lexical workshop - Würzburg, Germany<br />
42.
12 October 2011<br />14<br />Thank you for your attention!<br />Visit<br />www.isocat.org<br />Questions?<br />Menzo.Windhouwer@mpi.nl<br />TEI Lexical workshop - Würzburg, Germany<br />
0 likes
Be the first to like this
Views
Total views
1,528
On SlideShare
0
From Embeds
0
Number of Embeds
2
You have now unlocked unlimited access to 20M+ documents!
Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.