Your SlideShare is downloading. ×
ISOcat: a short introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

ISOcat: a short introduction

676
views

Published on

ISOcat presentation at the CLARIN-NL Bijeenkomst in Utrecht on February 19, 2010.

ISOcat presentation at the CLARIN-NL Bijeenkomst in Utrecht on February 19, 2010.


0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
676
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ISOcatA short introduction
    Marc Kemps-Snijdersa, Sue Ellen Wrightb, MenzoWindhouwera
    aMax Planck Institute for Psycholinguistics, bKent State University
    marc.kemps-snijders@mpi.nl, sellenwright@gmail.com, menzo.windhouwer@mpi.nl
    February 19, 2010
    1
    CLARIN-NL Bijeenkomst
  • 2. ISOcat: a data category registry
    ISO 12620:2009
    Terminology and other content and language resources — Specification of data categories and management of a Data Category Registry for language resources
    February 19, 2010
    CLARIN-NL Bijeenkomst
    2
  • 3. Data category
    The result of the specification of a given data field
    A data category is an elementary descriptor in a linguistic structure or an annotation scheme.
    Model consists of 3 main parts:
    Administrative part
    Administration and identification
    Descriptive part
    Documentation in various working languages
    Linguistic part
    Conceptual domain(s for various object languages)
    February 19, 2010
    3
    CLARIN-NL Bijeenkomst
  • 4. Data Category Registry
    DCR Board
    • ISOcat is a free service: anyone can access it or register as an expert and create/share his/her own data categories.
    • 5. Data categories can be submitted to the standardization process, in which case they are assigned to a Thematic Domain Group which judges it.
    • 6. At regular intervals, snapshots of the standardized subset of the DCR will be submitted to ISO.
    TDG
    metadata
    TDG
    …..
    TDG
    morphosyntax
    TDG
    terminology
    February 19, 2010
    4
    CLARIN-NL Bijeenkomst
  • 7. Standardization
    February 19, 2010
    CLARIN-NL Bijeenkomst
    5
    Decision Group
    Submission
    group
    Data Category Registry
    Board
    Thematic Domain
    Group
    Stewardship
    group
    Validation
    Evaluation
    rejected
    rejected
    Publication
  • 8. Data categories and linguistic resources
    February 19, 2010
    CLARIN-NL Bijeenkomst
    6
    partOfSpeech
    Lemma
    writtenForm
    writtenForm
    Word Form
    grammaticalGender
    lexicalType
    grammaticalGender
    wordOrder
    Lexicon
    1..*
    A (schema for a) typological database
    Lexical Entry
    Shared semantics!
    0..*
    1..*
    Form
    Sense
    0..*
    A (schema for a) lexicon
  • 9. <dcif:dataCategorypid="http://www.isocat.org/datcat/DC-1345" type="complex">
    <dcif:administrationInformation>
    <dcif:administrationRecord>
    <dcif:identifier>partOfSpeech</dcif:identifier>
    <dcif:version>0.0.0</dcif:version>
    <dcif:registrationStatus>candidate</dcif:registrationStatus>
    <dcif:origin>?</dcif:origin>
    <dcif:creation>
    <dcif:creationDate>2004-07-09</dcif:creationDate>
    <dcif:changeDescriptionxml:lang="en">

    </dcif:changeDescription>
    </dcif:creation>
    </dcif:administrationRecord>
    </dcif:administrationInformation>
    <dcif:descriptionSection>
    <dcif:profile>MorphoSyntax</dcif:profile>
    <dcif:languageSection>
    <dcif:language>en</dcif:language>
    <dcif:definitionSection>
    <dcif:definitionxml:lang="en">
    Term used to describe how a particular word
    is used in a sentence.



    </dcif:dataCategory>
    Data category persistent identifier (PID):
    http://isocat.org/datcat/ISO-DC-1345
    HTML content type
    Default content type
    HTTP
    307 redirect
    http://www.isocat.org/rest/dc/1345.html
    http://www.isocat.org/rest/dc/1345.dcif
    Referencing data categories
    February 19, 2010
    CLARIN-NL Bijeenkomst
    7
  • 10. Annotating linguistic resources
    February 19, 2010
    CLARIN-NL Bijeenkomst
    8
    • Schema language support for equivalence:
    • 11. for example ODD from TEI
    <elementSpecident="pos">
    <equiv name="partOfSpeech" uri="http://isocat.org/datcat/ISO-DC-369"/>

    </elementSpec>
    • Annotation using dcr:datcat attribute:
    • 12. for schemas or instances
    • 13. for example RelaxNG schema
    <rng:element name="partOfSpeech" dcr:datcat="http://isocat.org/datcat/ISO-DC-369" >
    <rng:choice>
    <rng:valuedcr:datcat="http://isocat.org/datcat/ISO-DC-370">
    verb
    </rng:value>
    <rng:valuedcr:datcat="http://isocat.org/datcat/ISO-DC-371">
    noun
    </rng:value>
    </rng:choice>
    </rng:element>
    • XML oriented, is more needed?
  • Data categories as RDF resources
    February 19, 2010
    CLARIN-NL Bijeenkomst
    9
    :headword
    dcr:datcat<http://isocat.org/datcat/DC-258> ;
    rdfs:label"head word"@en ;
    rdfs:comment"A lemma heading a dictionary entry."@en ;
    rdfs:label"lemma"@nl ;
    rdfs:comment"Het eerstewoord van eenartikel in eenwoordenboek."@nl .
    :partOfSpeech
    dcr:datcat<http://isocat.org/datcat/DC-396> ;
    rdfs:label"part of speech"@en ;
    rdfs:comment"A category assigned to a word based on its grammatical and semantic properties."@en .
    A domain modeling approach:
    :headword a rdfs:Class .
    :partOfSpeech a rdf:Property ;
    rdfs:domain:headword .
    Alternative approach:
    :headword a rdfs:Class .
    :partOfSpeech a rdf:Class.
    :hasPartOfSpeecha rdf:Property ;
    rdfs:domain:headword
    rdfs:range:partOfSpeech.
    :noun a partOfSpeech.
  • 14. ISOcat status
    February 19, 2010
    CLARIN-NL Bijeenkomst
    10
    ISOcat is under active development:
    Now:
    You can access public data categories and selections
    You can create your own data categories and selections
    You can share your data categories and selections with others (everyone, or a specified group)
    In progress:
    Cleanup of profiles by TDGs
    Standardization workflow
    Some social features (forum to discuss specific data categories)
    Import external ‘data category’ sets, such as:
    parts of the ISO Concept Database
    Dublin Core
    TEI
    Future:
    High availability (mirrors)
    Relation registry
  • 15. ISOcat workshop
    Utrecht, Thursday March 25, 2010
    Especially aimed at supporting Call 1 projects
    Signup at: www.clarin.nl
    Program:
    A deeper introduction to ISOcat
    A tutorial on using ISOcat
    How to annotate specific linguistic resources?
    February 19, 2010
    CLARIN-NL Bijeenkomst
    11
    Invitation
    Send examples of the types of linguistic resources your project wants to annotate with data category references to
    isocat@mpi.nl
    and we will discuss them at the workshop!
  • 16. February 19, 2010
    CLARIN-NL Bijeenkomst
    12
    Thank you for your attention!
    Visit
    www.isocat.org
    Questions?
    isocat@mpi.nl