Functional Requirements for an Interlinear Text Editor


Published on

Paper at LREC2004 (May 2004, Lisbon)

Published in: Economy & Finance, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Functional Requirements for an Interlinear Text Editor

  1. 1. Functional Requirements for an Interlinear Text Editor Baden Hughes 1 , Catherine Bow 1 and Steven Bird 1,2 1 University of Melbourne 2 Linguistic Data Consortium, University of Pennsylvania
  2. 2. Overview <ul><li>Introduction </li></ul><ul><li>Motivation </li></ul><ul><li>Selection Process </li></ul><ul><li>Evaluation Process </li></ul><ul><li>Functional Requirements </li></ul><ul><li>Conclusion </li></ul>
  3. 3. Introduction <ul><li>Interlinear text is a highly prevalent linguistic data type in both field linguistic data as well as in collated corpora </li></ul>
  4. 4. Motivation <ul><li>Previous work has provided an open interlinear encoding standard using XML technologies and demonstrated the flexibility of such an approach </li></ul><ul><ul><li>Bow, Hughes & Bird, 2003; Hughes, Bird & Bow 2003 </li></ul></ul><ul><li>Survey-based results of common functionality across a range of interlinear text handling applications </li></ul><ul><li>Motivated by the need to build a new interlinear text editing tool and a re-usable API for XML based interlinear text </li></ul>
  5. 5. Selection Process <ul><li>Discovered 40+ linguistically-grounded applications with at least some interlinear functionality </li></ul><ul><li>Technically-oriented selection criteria </li></ul><ul><ul><li>end user applications rather than application development frameworks </li></ul></ul><ul><ul><li>obtainable at low or zero cost </li></ul></ul><ul><ul><li>only require moderate level of technology literacy to install and use </li></ul></ul><ul><ul><li>applications which can be used in multiple contexts rather than a specialised single use </li></ul></ul><ul><ul><li>support for both unimodal and multimodal data </li></ul></ul><ul><ul><li>exclusion of presentation-oriented applications </li></ul></ul>
  6. 6. Evaluation Process <ul><li>Use of real linguistic data motivated by </li></ul><ul><ul><li>Replicate typical use patterns </li></ul></ul><ul><ul><li>Establish a data baseline for comparison </li></ul></ul><ul><li>Cross-platform evaluation where possible </li></ul><ul><li>Linguistically-oriented evaluation criteria from a functional perspective </li></ul><ul><ul><li>General editing </li></ul></ul><ul><ul><li>Structural segmentation and alignment </li></ul></ul><ul><ul><li>Flexible content model </li></ul></ul><ul><ul><li>Import and export capability </li></ul></ul><ul><ul><li>Non-Roman Script / Unicode </li></ul></ul><ul><ul><li>Customisable presentation output </li></ul></ul>
  7. 7. Functional Requirements <ul><li>Seeking commonly implemented functions for working with interlinear text, and the degrees of granularity at which these functions can be implemented </li></ul><ul><li>Functions derived from previous work which has contributed to the definition of the range and type of operations performed on interlinear text </li></ul><ul><ul><li>Bickford 1997; Kew & McConnell 1997; Maeda & Bird 2000; Bird et al 2002; Maeda et al 2002 </li></ul></ul><ul><li>Functions derived from selection process </li></ul><ul><ul><li>Application and API </li></ul></ul><ul><ul><li>Usable through whole project lifecycle </li></ul></ul><ul><ul><li>Multimodal and unimodal support </li></ul></ul><ul><ul><li>Cross-platform API </li></ul></ul><ul><ul><li>Freely redistributable </li></ul></ul><ul><li>Functions derived from evaluation process … </li></ul>
  8. 8. General Editing Functions <ul><li>Text selection </li></ul><ul><ul><li>one or more constituents at morph, word, phrase level </li></ul></ul><ul><ul><li>differentiate content from structure – select across morph/word/phrase cells and obtain content, structure or both </li></ul></ul><ul><li>Cut, copy & paste </li></ul><ul><ul><li>any unit of selected text, with or without rendered orthographic support </li></ul></ul><ul><ul><li>combinations will facilitate split and merge type actions </li></ul></ul><ul><ul><li>multiple selection clipboard </li></ul></ul><ul><li>Search </li></ul><ul><ul><li>regular expressions </li></ul></ul><ul><ul><li>within selection/range </li></ul></ul><ul><ul><li>multiple files </li></ul></ul><ul><ul><li>cache of previous searches </li></ul></ul><ul><ul><li>result navigation within text or index </li></ul></ul><ul><li>Replace </li></ul><ul><ul><li>As for search, with the addition of: </li></ul></ul><ul><ul><li>Optional replacement within text or index </li></ul></ul><ul><li>Multiple level redo and undo </li></ul>
  9. 9. Segmentation and Alignment <ul><li>Granularity of segmentation and alignment </li></ul><ul><ul><li>Support for morph, word or phrase segmentation </li></ul></ul><ul><ul><li>Annotation attachment to range of morphs, words or phrases </li></ul></ul><ul><li>Ontology support </li></ul><ul><ul><li>Links to discipline standard (eg GOLD) </li></ul></ul><ul><ul><li>Links to user specified ontologies for annotations </li></ul></ul><ul><li>Multimodal integration </li></ul><ul><ul><li>Any combination of: text, text + audio, text + video, audio + video, text + audio + video </li></ul></ul><ul><ul><li>user extensible annotation tiers </li></ul></ul><ul><ul><li>Cross-resource linking (eg XML ID/IDREF construct) </li></ul></ul>
  10. 10. Flexible Content Models <ul><li>Incomplete annotation </li></ul><ul><ul><li>ambiguous (multi-segment) </li></ul></ul><ul><ul><li>partial annotations </li></ul></ul><ul><ul><li>free text annotations </li></ul></ul><ul><li>Standoff annotation </li></ul><ul><ul><li>open format </li></ul></ul><ul><ul><li>non-resource dependent </li></ul></ul><ul><ul><li>structurally constrained and linked </li></ul></ul><ul><li>Ontology support </li></ul><ul><ul><li>Links to discipline standard (eg GOLD) </li></ul></ul><ul><ul><li>Links to user specified ontologies </li></ul></ul>
  11. 11. Import and Export <ul><li>Native XML data format </li></ul><ul><ul><li>Support for DTD or schema based XML interlinearised materials </li></ul></ul><ul><li>Format conversion </li></ul><ul><ul><li>Support for common interlinear formats such as </li></ul></ul><ul><ul><ul><li>Shoebox/Toolbox </li></ul></ul></ul><ul><ul><ul><li>ELAN </li></ul></ul></ul><ul><ul><ul><li>TASX </li></ul></ul></ul><ul><ul><ul><li>AGTK/InterTrans </li></ul></ul></ul><ul><ul><ul><li>Parsers for SGML/HTML/XML </li></ul></ul></ul><ul><li>Change/Version control </li></ul><ul><ul><li>Internal provenance tracking </li></ul></ul><ul><ul><li>Links to external change/version control systems eg CVS/RCS/Subversion/MKS … </li></ul></ul>
  12. 12. Non-Roman Scripts <ul><li>Unicode from Day 1 </li></ul><ul><ul><li>Flexible encodings </li></ul></ul><ul><ul><ul><li>UTF-8 and UTF-16 </li></ul></ul></ul><ul><ul><li>Retain support for legacy code pages </li></ul></ul><ul><li>Rendering for NRS </li></ul><ul><ul><li>Data entry using </li></ul></ul><ul><ul><ul><li>Native keyboarding </li></ul></ul></ul><ul><ul><ul><li>Glyph map </li></ul></ul></ul><ul><ul><ul><li>Unicode character codes </li></ul></ul></ul><ul><ul><li>Using open-source off-the-shelf Unicode rendering tool kits rather than reimplementing </li></ul></ul><ul><li>Directionality </li></ul><ul><ul><li>Horizontal (L>R/R>L) support </li></ul></ul><ul><ul><li>Vertical (T>B/B>T) modality support </li></ul></ul>
  13. 13. Presentation Output <ul><li>Text as Image </li></ul><ul><ul><li>Raster Formats </li></ul></ul><ul><ul><ul><li>GIF, JPEG, TIFF, EPS </li></ul></ul></ul><ul><ul><li>Vector Formats </li></ul></ul><ul><ul><ul><li>SVG </li></ul></ul></ul><ul><li>Text in Presentation Format </li></ul><ul><ul><li>PDF, RTF, HTML </li></ul></ul><ul><li>Customisable Presentation </li></ul><ul><ul><li>HTML + CSS (including user specified CSS) </li></ul></ul><ul><ul><li>XML + XSL (including user specified XSL – Hughes, Bird & Bow 2003 demonstrate a range of transformations for interlinear text using XSL) </li></ul></ul><ul><ul><li>Publisher’s Templates </li></ul></ul><ul><ul><li>Interface with 3 rd party XSL engines </li></ul></ul>
  14. 14. Conclusion <ul><li>Survey-based approach to specification of functional requirements allows us to build a best-of-breed interlinear application </li></ul><ul><li>Implementing within an open source framework eg AGTK and NLTK </li></ul><ul><li>Additional resources at: </li></ul>
  15. 15. Acknowledgements <ul><li>The research reported here is supported by the National Science Foundation: </li></ul><ul><ul><li>Grant #0094934 Electronic Metastructure for Endangered Language Data </li></ul></ul><ul><ul><li>Grant #998009 TalkBank </li></ul></ul><ul><ul><li>Grant #0317826 Querying Linguistic Databases </li></ul></ul>