Functional Requirements for an Interlinear Text Editor
Upcoming SlideShare
Loading in...5

Functional Requirements for an Interlinear Text Editor



Paper at LREC2004 (May 2004, Lisbon)

Paper at LREC2004 (May 2004, Lisbon)



Total Views
Views on SlideShare
Embed Views



1 Embed 7 7


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Functional Requirements for an Interlinear Text Editor Functional Requirements for an Interlinear Text Editor Presentation Transcript

  • Functional Requirements for an Interlinear Text Editor Baden Hughes 1 , Catherine Bow 1 and Steven Bird 1,2 1 University of Melbourne 2 Linguistic Data Consortium, University of Pennsylvania
  • Overview
    • Introduction
    • Motivation
    • Selection Process
    • Evaluation Process
    • Functional Requirements
    • Conclusion
  • Introduction
    • Interlinear text is a highly prevalent linguistic data type in both field linguistic data as well as in collated corpora
  • Motivation
    • Previous work has provided an open interlinear encoding standard using XML technologies and demonstrated the flexibility of such an approach
      • Bow, Hughes & Bird, 2003; Hughes, Bird & Bow 2003
    • Survey-based results of common functionality across a range of interlinear text handling applications
    • Motivated by the need to build a new interlinear text editing tool and a re-usable API for XML based interlinear text
  • Selection Process
    • Discovered 40+ linguistically-grounded applications with at least some interlinear functionality
    • Technically-oriented selection criteria
      • end user applications rather than application development frameworks
      • obtainable at low or zero cost
      • only require moderate level of technology literacy to install and use
      • applications which can be used in multiple contexts rather than a specialised single use
      • support for both unimodal and multimodal data
      • exclusion of presentation-oriented applications
  • Evaluation Process
    • Use of real linguistic data motivated by
      • Replicate typical use patterns
      • Establish a data baseline for comparison
    • Cross-platform evaluation where possible
    • Linguistically-oriented evaluation criteria from a functional perspective
      • General editing
      • Structural segmentation and alignment
      • Flexible content model
      • Import and export capability
      • Non-Roman Script / Unicode
      • Customisable presentation output
  • Functional Requirements
    • Seeking commonly implemented functions for working with interlinear text, and the degrees of granularity at which these functions can be implemented
    • Functions derived from previous work which has contributed to the definition of the range and type of operations performed on interlinear text
      • Bickford 1997; Kew & McConnell 1997; Maeda & Bird 2000; Bird et al 2002; Maeda et al 2002
    • Functions derived from selection process
      • Application and API
      • Usable through whole project lifecycle
      • Multimodal and unimodal support
      • Cross-platform API
      • Freely redistributable
    • Functions derived from evaluation process …
  • General Editing Functions
    • Text selection
      • one or more constituents at morph, word, phrase level
      • differentiate content from structure – select across morph/word/phrase cells and obtain content, structure or both
    • Cut, copy & paste
      • any unit of selected text, with or without rendered orthographic support
      • combinations will facilitate split and merge type actions
      • multiple selection clipboard
    • Search
      • regular expressions
      • within selection/range
      • multiple files
      • cache of previous searches
      • result navigation within text or index
    • Replace
      • As for search, with the addition of:
      • Optional replacement within text or index
    • Multiple level redo and undo
  • Segmentation and Alignment
    • Granularity of segmentation and alignment
      • Support for morph, word or phrase segmentation
      • Annotation attachment to range of morphs, words or phrases
    • Ontology support
      • Links to discipline standard (eg GOLD)
      • Links to user specified ontologies for annotations
    • Multimodal integration
      • Any combination of: text, text + audio, text + video, audio + video, text + audio + video
      • user extensible annotation tiers
      • Cross-resource linking (eg XML ID/IDREF construct)
  • Flexible Content Models
    • Incomplete annotation
      • ambiguous (multi-segment)
      • partial annotations
      • free text annotations
    • Standoff annotation
      • open format
      • non-resource dependent
      • structurally constrained and linked
    • Ontology support
      • Links to discipline standard (eg GOLD)
      • Links to user specified ontologies
  • Import and Export
    • Native XML data format
      • Support for DTD or schema based XML interlinearised materials
    • Format conversion
      • Support for common interlinear formats such as
        • Shoebox/Toolbox
        • ELAN
        • TASX
        • AGTK/InterTrans
        • Parsers for SGML/HTML/XML
    • Change/Version control
      • Internal provenance tracking
      • Links to external change/version control systems eg CVS/RCS/Subversion/MKS …
  • Non-Roman Scripts
    • Unicode from Day 1
      • Flexible encodings
        • UTF-8 and UTF-16
      • Retain support for legacy code pages
    • Rendering for NRS
      • Data entry using
        • Native keyboarding
        • Glyph map
        • Unicode character codes
      • Using open-source off-the-shelf Unicode rendering tool kits rather than reimplementing
    • Directionality
      • Horizontal (L>R/R>L) support
      • Vertical (T>B/B>T) modality support
  • Presentation Output
    • Text as Image
      • Raster Formats
        • GIF, JPEG, TIFF, EPS
      • Vector Formats
        • SVG
    • Text in Presentation Format
      • PDF, RTF, HTML
    • Customisable Presentation
      • HTML + CSS (including user specified CSS)
      • XML + XSL (including user specified XSL – Hughes, Bird & Bow 2003 demonstrate a range of transformations for interlinear text using XSL)
      • Publisher’s Templates
      • Interface with 3 rd party XSL engines
  • Conclusion
    • Survey-based approach to specification of functional requirements allows us to build a best-of-breed interlinear application
    • Implementing within an open source framework eg AGTK and NLTK
    • Additional resources at:
  • Acknowledgements
    • The research reported here is supported by the National Science Foundation:
      • Grant #0094934 Electronic Metastructure for Endangered Language Data
      • Grant #998009 TalkBank
      • Grant #0317826 Querying Linguistic Databases