Functional Requirements for an Interlinear Text Editor

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Functional Requirements for an Interlinear Text Editor - Presentation Transcript

    1. Functional Requirements for an Interlinear Text Editor Baden Hughes 1 , Catherine Bow 1 and Steven Bird 1,2 1 University of Melbourne 2 Linguistic Data Consortium, University of Pennsylvania
    2. Overview
      • Introduction
      • Motivation
      • Selection Process
      • Evaluation Process
      • Functional Requirements
      • Conclusion
    3. Introduction
      • Interlinear text is a highly prevalent linguistic data type in both field linguistic data as well as in collated corpora
    4. Motivation
      • Previous work has provided an open interlinear encoding standard using XML technologies and demonstrated the flexibility of such an approach
        • Bow, Hughes & Bird, 2003; Hughes, Bird & Bow 2003
      • Survey-based results of common functionality across a range of interlinear text handling applications
      • Motivated by the need to build a new interlinear text editing tool and a re-usable API for XML based interlinear text
    5. Selection Process
      • Discovered 40+ linguistically-grounded applications with at least some interlinear functionality
      • Technically-oriented selection criteria
        • end user applications rather than application development frameworks
        • obtainable at low or zero cost
        • only require moderate level of technology literacy to install and use
        • applications which can be used in multiple contexts rather than a specialised single use
        • support for both unimodal and multimodal data
        • exclusion of presentation-oriented applications
    6. Evaluation Process
      • Use of real linguistic data motivated by
        • Replicate typical use patterns
        • Establish a data baseline for comparison
      • Cross-platform evaluation where possible
      • Linguistically-oriented evaluation criteria from a functional perspective
        • General editing
        • Structural segmentation and alignment
        • Flexible content model
        • Import and export capability
        • Non-Roman Script / Unicode
        • Customisable presentation output
    7. Functional Requirements
      • Seeking commonly implemented functions for working with interlinear text, and the degrees of granularity at which these functions can be implemented
      • Functions derived from previous work which has contributed to the definition of the range and type of operations performed on interlinear text
        • Bickford 1997; Kew & McConnell 1997; Maeda & Bird 2000; Bird et al 2002; Maeda et al 2002
      • Functions derived from selection process
        • Application and API
        • Usable through whole project lifecycle
        • Multimodal and unimodal support
        • Cross-platform API
        • Freely redistributable
      • Functions derived from evaluation process …
    8. General Editing Functions
      • Text selection
        • one or more constituents at morph, word, phrase level
        • differentiate content from structure – select across morph/word/phrase cells and obtain content, structure or both
      • Cut, copy & paste
        • any unit of selected text, with or without rendered orthographic support
        • combinations will facilitate split and merge type actions
        • multiple selection clipboard
      • Search
        • regular expressions
        • within selection/range
        • multiple files
        • cache of previous searches
        • result navigation within text or index
      • Replace
        • As for search, with the addition of:
        • Optional replacement within text or index
      • Multiple level redo and undo
    9. Segmentation and Alignment
      • Granularity of segmentation and alignment
        • Support for morph, word or phrase segmentation
        • Annotation attachment to range of morphs, words or phrases
      • Ontology support
        • Links to discipline standard (eg GOLD)
        • Links to user specified ontologies for annotations
      • Multimodal integration
        • Any combination of: text, text + audio, text + video, audio + video, text + audio + video
        • user extensible annotation tiers
        • Cross-resource linking (eg XML ID/IDREF construct)
    10. Flexible Content Models
      • Incomplete annotation
        • ambiguous (multi-segment)
        • partial annotations
        • free text annotations
      • Standoff annotation
        • open format
        • non-resource dependent
        • structurally constrained and linked
      • Ontology support
        • Links to discipline standard (eg GOLD)
        • Links to user specified ontologies
    11. Import and Export
      • Native XML data format
        • Support for DTD or schema based XML interlinearised materials
      • Format conversion
        • Support for common interlinear formats such as
          • Shoebox/Toolbox
          • ELAN
          • TASX
          • AGTK/InterTrans
          • Parsers for SGML/HTML/XML
      • Change/Version control
        • Internal provenance tracking
        • Links to external change/version control systems eg CVS/RCS/Subversion/MKS …
    12. Non-Roman Scripts
      • Unicode from Day 1
        • Flexible encodings
          • UTF-8 and UTF-16
        • Retain support for legacy code pages
      • Rendering for NRS
        • Data entry using
          • Native keyboarding
          • Glyph map
          • Unicode character codes
        • Using open-source off-the-shelf Unicode rendering tool kits rather than reimplementing
      • Directionality
        • Horizontal (L>R/R>L) support
        • Vertical (T>B/B>T) modality support
    13. Presentation Output
      • Text as Image
        • Raster Formats
          • GIF, JPEG, TIFF, EPS
        • Vector Formats
          • SVG
      • Text in Presentation Format
        • PDF, RTF, HTML
      • Customisable Presentation
        • HTML + CSS (including user specified CSS)
        • XML + XSL (including user specified XSL – Hughes, Bird & Bow 2003 demonstrate a range of transformations for interlinear text using XSL)
        • Publisher’s Templates
        • Interface with 3 rd party XSL engines
    14. Conclusion
      • Survey-based approach to specification of functional requirements allows us to build a best-of-breed interlinear application
      • Implementing within an open source framework eg AGTK and NLTK
      • Additional resources at: http://www.cs.mu.oz.au/research/lt/projects/interlinear
    15. Acknowledgements
      • The research reported here is supported by the National Science Foundation:
        • Grant #0094934 Electronic Metastructure for Endangered Language Data
        • Grant #998009 TalkBank
        • Grant #0317826 Querying Linguistic Databases

    + Baden  HughesBaden Hughes, 2 years ago

    custom

    571 views, 0 favs, 0 embeds more stats

    Paper at LREC2004 (May 2004, Lisbon)

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 571
      • 571 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 9
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories