Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

  • 668 views
Uploaded on

This paper gives guidelines of how to create and update Propbank frameset files using a dedicated editor, Cornerstone. Propbank is a corpus in which the arguments of each verb predicate are......

This paper gives guidelines of how to create and update Propbank frameset files using a dedicated editor, Cornerstone. Propbank is a corpus in which the arguments of each verb predicate are annotated with their semantic roles in relation to the predicate. Propbank annotation also requires the choice of a sense ID for each predicate. Thus, for each predicate in Propbank, there exists a corresponding frameset file showing the expected predicate argument structure of each sense related to the predicate. Since most Propbank annotations are based on the predicate argument structure defined in the frameset files, it is important to keep the files consistent, simple to read as well as easy to update. The frameset files are written in XML, which can be difficult to edit when using a simple text editor. Therefore, it is helpful to develop a user-friendly editor such as Cornerstone, specifically customized to create and edit frameset files. Cornerstone runs platform independently, is light enough to run as an X11 application and supports multiple languages such as Arabic, Chinese, English, Hindi and Korean.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
668
On Slideshare
668
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Propbank Frameset Annotation GuidelinesUsing a Dedicated Editor, CornerstoneJinho Choi, Claire Bonial, Martha PalmerInstitute of Cognitive Science, University of Colorado at Boulder
    Propbank
    Frameset Files
    Cornerstone
    Advantages and Features
    More about Cornerstone
    Acknowledgements
    • A corpus in which the arguments of each verb predicate are annotated with their semantic roles.
    • 2. Each predicate is also annotated with its sense id supplied through frameset files.
    • 3. Frameset files outline the argument structure for each sense of every predicate in the Propbank.
    • 4. Annotators use the semantic and syntactic information provided in frameset files to efficiently make consistent annotations.
    How to obtain Cornerstone
    • Available as an open source project on Google code (http://code.google.com/p/propbank).
    • 5. The project also provides a Propbank instance editor, Jubilee.
    • 6. Both Cornerstone and Jubilee have been used in several universities.
    • 7. Contact: choijd@colorado.edu
    • 8. We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE- CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
    Propbank annotations are based on the predicate argument structure outlined in frameset files
    Includingverb-particleconstructions
    One pereach verb
    All verbsin Propbank
    It is important to keep the frameset files consistent, simple to read, and easy to update.
    opened: open.01
    Multi-lemma mode
    • A predicate can have multiple lemmas (e.g., open, open up).
    • 9. Languages: English, Hindi
    Uni-lemma mode
    • A predicate can have only one lemma.
    • 10. Languages: Arabic, Chinese
    Frameset files in XML
    • All frameset files are written in XML that is complicated for those who are not familiar with it, leading to potential errors.
    John
    the door
    with his foot
    arg0 (agent)
    arg1 (theme)
    arg2 (instrument)
    Lemma(s)
    Senses
    Multi-lemma: Roleset
    Uni-lemma: Frameset
    Mappings between
    syntactic and semantic arguments
    Argument structurefor the selected sense
    Examples
    for the selected sense
    <frameset>
    <predicatelemma="open">
    <rolesetid="open.01" name="open" vncls="40.3.2 45.4 47.6">
    <roles>
    <roledescr="opener" n="0">
    <vnrolevncls="47.6" vntheta="Agent"/>
    <vnrolevncls="40.3.2" vntheta="Agent"/>
    <vnrolevncls="45.4" vntheta="Agent"/>
    </role>
    <roledescr="thing opening" n="1">
    <vnrolevncls="47.6" vntheta="Theme"/>
    <vnrolevncls="40.3.2" vntheta="Patient"/>
    <vnrolevncls="45.4" vntheta="Patient"/>
    </role>

    • Platform independent: runs on any platform with JVM (Java 6.0).
    • 11. Multilingual: accommodates Arabic, Chinese, English, Hindi and Korean.
    • 12. Run on X11: annotators can make updates remotely.
    • 13. Easy customization: allows users to easily customize tags required for frameset annotations.
    • 14. Free of XML: frameset authors do not need to know any XML.
    • 15. Free of errors: Frameset files created by Cornerstone are guaranteed to be free of errors.