Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

589 views

Published on

This paper gives guidelines of how to create and update Propbank frameset files using a dedicated editor, Cornerstone. Propbank is a corpus in which the arguments of each verb predicate are annotated with their semantic roles in relation to the predicate. Propbank annotation also requires the choice of a sense ID for each predicate. Thus, for each predicate in Propbank, there exists a corresponding frameset file showing the expected predicate argument structure of each sense related to the predicate. Since most Propbank annotations are based on the predicate argument structure defined in the frameset files, it is important to keep the files consistent, simple to read as well as easy to update. The frameset files are written in XML, which can be difficult to edit when using a simple text editor. Therefore, it is helpful to develop a user-friendly editor such as Cornerstone, specifically customized to create and edit frameset files. Cornerstone runs platform independently, is light enough to run as an X11 application and supports multiple languages such as Arabic, Chinese, English, Hindi and Korean.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Propbank Frameset Annotation Guidelines Using a Dedicated Editor, Cornerstone

  1. 1. Propbank Frameset Annotation GuidelinesUsing a Dedicated Editor, CornerstoneJinho Choi, Claire Bonial, Martha PalmerInstitute of Cognitive Science, University of Colorado at Boulder<br />Propbank<br />Frameset Files<br />Cornerstone<br />Advantages and Features<br />More about Cornerstone<br />Acknowledgements<br /><ul><li> A corpus in which the arguments of each verb predicate are annotated with their semantic roles.
  2. 2. Each predicate is also annotated with its sense id supplied through frameset files.
  3. 3. Frameset files outline the argument structure for each sense of every predicate in the Propbank.
  4. 4. Annotators use the semantic and syntactic information provided in frameset files to efficiently make consistent annotations.</li></ul>How to obtain Cornerstone<br /><ul><li> Available as an open source project on Google code (http://code.google.com/p/propbank).
  5. 5. The project also provides a Propbank instance editor, Jubilee.
  6. 6. Both Cornerstone and Jubilee have been used in several universities.
  7. 7. Contact: choijd@colorado.edu
  8. 8. We gratefully acknowledge the support of the National Science Foundation Grants CISE- CRI-0551615, Towards a Comprehensive Linguistic Annotation and CISE- CRI 0709167, Collaborative: A Multi-Representational and Multi-Layered Treebank for Hindi/Urdu, and a grant from the Defense Advanced Research Projects Agency (DARPA/IPTO) under the GALE program, DARPA/CMO Contract No. HR0011-06-C-0022, subcontract from BBN, Inc. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.</li></ul>Propbank annotations are based on the predicate argument structure outlined in frameset files<br />Includingverb-particleconstructions<br />One pereach verb<br />All verbsin Propbank<br />It is important to keep the frameset files consistent, simple to read, and easy to update.<br />opened: open.01<br />Multi-lemma mode<br /><ul><li> A predicate can have multiple lemmas (e.g., open, open up).
  9. 9. Languages: English, Hindi</li></ul>Uni-lemma mode<br /><ul><li> A predicate can have only one lemma.
  10. 10. Languages: Arabic, Chinese</li></ul>Frameset files in XML<br /><ul><li> All frameset files are written in XML that is complicated for those who are not familiar with it, leading to potential errors.</li></ul>John<br />the door<br />with his foot<br />arg0 (agent)<br />arg1 (theme)<br />arg2 (instrument)<br />Lemma(s)<br />Senses<br />Multi-lemma: Roleset<br />Uni-lemma: Frameset<br />Mappings between<br />syntactic and semantic arguments<br />Argument structurefor the selected sense<br />Examples<br />for the selected sense<br /><frameset><br /><predicatelemma="open"><br /><rolesetid="open.01" name="open" vncls="40.3.2 45.4 47.6"><br /><roles><br /><roledescr="opener" n="0"><br /> <vnrolevncls="47.6" vntheta="Agent"/><br /> <vnrolevncls="40.3.2" vntheta="Agent"/><br /> <vnrolevncls="45.4" vntheta="Agent"/><br /></role><br /><roledescr="thing opening" n="1"><br /> <vnrolevncls="47.6" vntheta="Theme"/><br /> <vnrolevncls="40.3.2" vntheta="Patient"/><br /> <vnrolevncls="45.4" vntheta="Patient"/><br /></role><br />…<br /><ul><li>Platform independent: runs on any platform with JVM (Java 6.0).
  11. 11. Multilingual: accommodates Arabic, Chinese, English, Hindi and Korean.
  12. 12. Run on X11: annotators can make updates remotely.
  13. 13. Easy customization: allows users to easily customize tags required for frameset annotations.
  14. 14. Free of XML: frameset authors do not need to know any XML.
  15. 15. Free of errors: Frameset files created by Cornerstone are guaranteed to be free of errors.</li>

×