Functional Requirements for an Interlinear Text Editor Baden Hughes 1 , Catherine Bow 1  and Steven Bird 1,2 1 University of Melbourne 2 Linguistic Data Consortium, University of Pennsylvania
Overview Introduction Motivation Selection Process Evaluation Process Functional Requirements Conclusion
Introduction Interlinear text is a highly prevalent linguistic data type in both field linguistic data as well as in collated corpora
Motivation Previous work has provided an open interlinear encoding standard using XML technologies and demonstrated the flexibility of such an approach  Bow, Hughes & Bird, 2003; Hughes, Bird & Bow 2003 Survey-based results of common functionality across a range of interlinear text handling applications Motivated by the need to build a new interlinear text editing tool and a re-usable API for XML based interlinear text
Selection Process Discovered 40+ linguistically-grounded applications with at least some interlinear functionality Technically-oriented selection criteria end user applications rather than application development frameworks obtainable at low or zero cost only require moderate level of technology literacy to install and use applications which can be used in multiple contexts rather than a specialised single use support for both unimodal and multimodal data exclusion of presentation-oriented applications
Evaluation Process Use of real linguistic data motivated by Replicate typical use patterns Establish a data baseline for comparison Cross-platform evaluation where possible Linguistically-oriented evaluation criteria from a functional perspective General editing Structural segmentation and alignment Flexible content model Import and export capability Non-Roman Script / Unicode Customisable presentation output
Functional Requirements Seeking commonly implemented functions for working with interlinear text, and the degrees of granularity at which these functions can be implemented Functions derived from previous work which has contributed to the definition of the range and type of operations performed on interlinear text Bickford 1997; Kew & McConnell 1997; Maeda & Bird 2000; Bird et al 2002; Maeda et al 2002 Functions derived from selection process Application and API Usable through whole project lifecycle Multimodal and unimodal support Cross-platform API Freely redistributable Functions derived from evaluation process …
General Editing Functions Text selection one or more constituents at morph, word, phrase level differentiate content from structure – select across morph/word/phrase cells and obtain content, structure or both Cut, copy & paste any unit of selected text, with or without rendered orthographic support combinations will facilitate split and merge type actions multiple selection clipboard Search regular expressions within selection/range multiple files cache of previous searches result navigation within text or index Replace As for search, with the addition of: Optional replacement within text or index Multiple level redo and undo
Segmentation and Alignment Granularity of segmentation and alignment Support for morph, word or phrase segmentation Annotation attachment to range of morphs, words or phrases Ontology support Links to discipline standard (eg GOLD)  Links to user specified ontologies for annotations Multimodal integration Any combination of: text, text + audio, text + video, audio + video, text + audio + video user extensible annotation tiers Cross-resource linking (eg XML ID/IDREF construct)
Flexible Content Models Incomplete annotation ambiguous (multi-segment) partial annotations free text annotations Standoff annotation open format non-resource dependent  structurally constrained and linked Ontology support Links to discipline standard (eg GOLD) Links to user specified ontologies
Import and Export Native XML data format Support for DTD or schema based XML interlinearised materials Format conversion Support for common interlinear formats such as Shoebox/Toolbox ELAN TASX AGTK/InterTrans Parsers for SGML/HTML/XML Change/Version control Internal provenance tracking Links to external change/version control systems eg CVS/RCS/Subversion/MKS …
Non-Roman Scripts Unicode from Day 1 Flexible encodings UTF-8 and UTF-16 Retain support for legacy code pages Rendering for NRS Data entry using  Native keyboarding Glyph map Unicode character codes Using open-source off-the-shelf Unicode rendering tool kits rather than reimplementing Directionality Horizontal (L>R/R>L) support Vertical (T>B/B>T) modality support
Presentation Output Text as Image Raster Formats GIF, JPEG, TIFF, EPS Vector Formats SVG Text in Presentation Format PDF, RTF, HTML Customisable Presentation HTML + CSS (including user specified CSS) XML + XSL (including user specified XSL – Hughes, Bird & Bow 2003 demonstrate a range of transformations for interlinear text using XSL) Publisher’s Templates Interface with 3 rd  party XSL engines
Conclusion Survey-based approach to specification of functional requirements allows us to build a best-of-breed interlinear application Implementing within an open source framework eg AGTK and NLTK Additional resources at:  http://www.cs.mu.oz.au/research/lt/projects/interlinear
Acknowledgements The research reported here is supported by the National Science Foundation: Grant #0094934 Electronic Metastructure for Endangered Language Data Grant #998009 TalkBank Grant #0317826 Querying Linguistic Databases

Functional Requirements for an Interlinear Text Editor

  • 1.
    Functional Requirements foran Interlinear Text Editor Baden Hughes 1 , Catherine Bow 1 and Steven Bird 1,2 1 University of Melbourne 2 Linguistic Data Consortium, University of Pennsylvania
  • 2.
    Overview Introduction MotivationSelection Process Evaluation Process Functional Requirements Conclusion
  • 3.
    Introduction Interlinear textis a highly prevalent linguistic data type in both field linguistic data as well as in collated corpora
  • 4.
    Motivation Previous workhas provided an open interlinear encoding standard using XML technologies and demonstrated the flexibility of such an approach Bow, Hughes & Bird, 2003; Hughes, Bird & Bow 2003 Survey-based results of common functionality across a range of interlinear text handling applications Motivated by the need to build a new interlinear text editing tool and a re-usable API for XML based interlinear text
  • 5.
    Selection Process Discovered40+ linguistically-grounded applications with at least some interlinear functionality Technically-oriented selection criteria end user applications rather than application development frameworks obtainable at low or zero cost only require moderate level of technology literacy to install and use applications which can be used in multiple contexts rather than a specialised single use support for both unimodal and multimodal data exclusion of presentation-oriented applications
  • 6.
    Evaluation Process Useof real linguistic data motivated by Replicate typical use patterns Establish a data baseline for comparison Cross-platform evaluation where possible Linguistically-oriented evaluation criteria from a functional perspective General editing Structural segmentation and alignment Flexible content model Import and export capability Non-Roman Script / Unicode Customisable presentation output
  • 7.
    Functional Requirements Seekingcommonly implemented functions for working with interlinear text, and the degrees of granularity at which these functions can be implemented Functions derived from previous work which has contributed to the definition of the range and type of operations performed on interlinear text Bickford 1997; Kew & McConnell 1997; Maeda & Bird 2000; Bird et al 2002; Maeda et al 2002 Functions derived from selection process Application and API Usable through whole project lifecycle Multimodal and unimodal support Cross-platform API Freely redistributable Functions derived from evaluation process …
  • 8.
    General Editing FunctionsText selection one or more constituents at morph, word, phrase level differentiate content from structure – select across morph/word/phrase cells and obtain content, structure or both Cut, copy & paste any unit of selected text, with or without rendered orthographic support combinations will facilitate split and merge type actions multiple selection clipboard Search regular expressions within selection/range multiple files cache of previous searches result navigation within text or index Replace As for search, with the addition of: Optional replacement within text or index Multiple level redo and undo
  • 9.
    Segmentation and AlignmentGranularity of segmentation and alignment Support for morph, word or phrase segmentation Annotation attachment to range of morphs, words or phrases Ontology support Links to discipline standard (eg GOLD) Links to user specified ontologies for annotations Multimodal integration Any combination of: text, text + audio, text + video, audio + video, text + audio + video user extensible annotation tiers Cross-resource linking (eg XML ID/IDREF construct)
  • 10.
    Flexible Content ModelsIncomplete annotation ambiguous (multi-segment) partial annotations free text annotations Standoff annotation open format non-resource dependent structurally constrained and linked Ontology support Links to discipline standard (eg GOLD) Links to user specified ontologies
  • 11.
    Import and ExportNative XML data format Support for DTD or schema based XML interlinearised materials Format conversion Support for common interlinear formats such as Shoebox/Toolbox ELAN TASX AGTK/InterTrans Parsers for SGML/HTML/XML Change/Version control Internal provenance tracking Links to external change/version control systems eg CVS/RCS/Subversion/MKS …
  • 12.
    Non-Roman Scripts Unicodefrom Day 1 Flexible encodings UTF-8 and UTF-16 Retain support for legacy code pages Rendering for NRS Data entry using Native keyboarding Glyph map Unicode character codes Using open-source off-the-shelf Unicode rendering tool kits rather than reimplementing Directionality Horizontal (L>R/R>L) support Vertical (T>B/B>T) modality support
  • 13.
    Presentation Output Textas Image Raster Formats GIF, JPEG, TIFF, EPS Vector Formats SVG Text in Presentation Format PDF, RTF, HTML Customisable Presentation HTML + CSS (including user specified CSS) XML + XSL (including user specified XSL – Hughes, Bird & Bow 2003 demonstrate a range of transformations for interlinear text using XSL) Publisher’s Templates Interface with 3 rd party XSL engines
  • 14.
    Conclusion Survey-based approachto specification of functional requirements allows us to build a best-of-breed interlinear application Implementing within an open source framework eg AGTK and NLTK Additional resources at: http://www.cs.mu.oz.au/research/lt/projects/interlinear
  • 15.
    Acknowledgements The researchreported here is supported by the National Science Foundation: Grant #0094934 Electronic Metastructure for Endangered Language Data Grant #998009 TalkBank Grant #0317826 Querying Linguistic Databases