Your SlideShare is downloading. ×
0
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Encoding and Presenting Interlinear Text Using XML Technologies
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Encoding and Presenting Interlinear Text Using XML Technologies

1,105

Published on

Paper at ALTW2003 (December 2003, Melbourne)

Paper at ALTW2003 (December 2003, Melbourne)

Published in: Economy & Finance, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,105
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Encoding and Presenting Interlinear Text Using XML Technologies Baden Hughes, Steven Bird, Catherine Bow University of Melbourne Australasian Language Technology Workshop December 10, 2003
  • 2. Introduction / Outline
    • What is interlinear text?
    • EMELD Interlinear Text Model
    • XML Representation
    • Interlinear Text Styles
    • XSL Rendering
    • Prototype & Implementation
    • Future Research
  • 3. What is interlinear text?
    • A standard presentational form for displaying a source text aligned with a variety of linguistic annotations
      • may include phonological, morphological, syntactic analyses, glosses, translations, comments
    • Variations in structure, alignment, display styles, mapping, wrapping, etc.
    • Typical example of three line text:
    Yidinj (Dixon 1977)
  • 4. Interlinear text samples (contd)
    • Nivkh (Comrie 1981)
    text metadata notes free translation
  • 5. Interlinear Text Samples sh v3.0 485 SE Text itm kalsrap.mov t Story from tape 20001bx told by Kalsarap Namaf. aud kalsrap.mov as 0 ae 13.0002 x Akit tumaui tae esan ipi, go mr akit tu- mau tae esan i - pi go mg 1plincS 1plincRS- all know place 3sgRS - be and POS pron pron- quantifier vambi n pron - v conj fg We all know that place, and this Litrapong… fgb Yumi evriwan isave ples ia. Mo Litrapong (Lisepsep) ia. South Efate (Namaf, 2001)
  • 6. EMELD Interlinear Text Model TEXT WORD WORD PHRASE PHRASE PHRASE WORD WORD WORD WORD M M M M M M M WORD (M = Morph) M M M M M M M
  • 7. XML Representation
    • <interlinear-text>
    • <item type=”user-defined”>
    • Content at the text level, such as metadata,
    • or an unaligned transcription of the entire text,
    • or a pointer to an unaligned audio file
    • </item>
    • <phrases>
    • Nested XML content to represent the phrasal
    • constituents of the text
    • </phrases>
    • </interlinear-text>
    • Each level is considered an element in
    • an XML document
  • 8.
    • <interlinear-text>
    • <item type=“title”> A Yidinj Story </item>
    • <phrases>
    • <phrase>
    • <item type=“number”> 99 </item>
    • <item type=“gls”> Where have you come from? </item>
    • <words>
    • <word>
    • <item type=“txt”> nundu </item>
    • <morphs>
    • <morph>
    • <item type=“gls”> you-SA </item>
    • </morph>
    • </morphs>
    • </word>
    • <word>
    • <item type=“txt”> wandam </item>
    • <morphs>
    • <morph>
    • <item type=“gls”> where-ABL </item>
    • </morph>
    • </morphs>
    • </word>
    • </words>
    • <phrase>
    • </phrases>
    • </interlinear-text>
    XML Representation – Yidinj text
  • 9. Interlinear Text Styles
    • Row display
    • Row styles
    • Row ordering
    • Grouping of content
    TEXT nyewøxi nyenæcyøje q syo q MNG (noun+Ø+vbs) (noun+n/j+acpl+cbs)(suf+genpl) (noun+jo+gbs) (suf+nompl) BASE nyewøxi nyenæcyøh syo MITA Traditional folk songs. (1) Tundra Nenets (Paakkan, 1997) Nyewºxiº nyenecyøyeq syoq. ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL Traditional folk songs. (2) Tundra Nenets (Susoi 1990)
  • 10.
    • Key presentation challenge of interlinear text
    • Complexity due to relative length of analysis & source
    Line-wrapping Nyew º xi º nyenecy ø yeq syoq ancient.ABS.NOM.SG person.ABS.GEN.PL song.ABS.NOM.PL Traditional folk songs
    • Implications for rendering technology
    Hypothetical line length Hypothetical line wrapping Correct line wrapping
  • 11. XSL Rendering
    • Transforms XML docs into other formats
    • Generate a variety of useful formats for
      • human consumption (e.g. html, pdf, jpg)
      • machine consumption (e.g. to another XML format)
    • Two stages:
      • Convert XML to format specifying grouping, row ordering, styles
      • Convert XML into formatting instructions of another language
        • Conversion to XSL Formatting Objects (XSL-FO)
        • Rendering into delivery format
  • 12. XSL Formatting Objects
    • XSL-FO is an XML application that describes how pages will look when presented to a reader
    XML + XSL XSL-FO OUTPUT   Abstract representation Stylesheet transformation Rendered version: XML, PDF, JPG, etc. Abstract presentational format
  • 13. XSL Implementation XSL 1 XML UR Abstract representation Delivery XSL 2 XSL 3 XML SR Surface representation XSL FO XML FO PDF HTML RTF SVG JPEG XSL PUB XSL PUB Rendered in XML
  • 14.
    • <xsl:template match=”phrase”>
    • <phrase>
    • <xsl:apply-templates select=”words”/>
    • <xsl:apply-templates select=”item”/>
    • </phrase>
    • </xsl:template>
    XSL Example - phrase
  • 15.
    • <xsl:template match=“document”>
    • <document>
    • <interlinear-text>
    • <phrases>
    • <xsl:for-each
    • select=“interlinear-text/phrases/phrase/words/word”>
    • <xsl:sort select=“.”/>
    • <phrase>
    • <words>
    • <xsl:copy-of select=“.”/>
    • </words>
    • </phrase>
    • </xsl:for-each>
    • </phrases>
    • </interlinear-text>
    • </document>
    • </xsl:template>
    XSL Example - document
  • 16. Example: Nenets interlinear (Susoi)
  • 17. Example: Nenets (Susoi) structure
  • 18. Example: Nenets (Susoi) wordlist
  • 19. Prototype
    • Underlying Data
    • Surface Display
    • Variant Display
      • Simple display types
        • Free translation as separate block
        • or separate frame for synchronised scrolling and linking
      • Complex display types
        • Metastructural display
        • Row re-ordering
        • Optional row display
        • Wordlist linkage
        • Concordance linkage
  • 20. Implementation
    • User Interface
      • Select input text, display types, output format
    • Parameterisation Logic
      • Processed by script to determine display type and result type
    • Rendering Engine
      • Combines source and option parameters to generate appropriate output type for browser to display
  • 21.  
  • 22. Future Research
    • Architectural Extensions
      • Linguistic ontologies
      • Text mining and retrieval
      • Compatibility with other schemata
    • API for interlinear text manipulation
    • Embedding interlinear functionality in application instances
      • e.g. AGTK
  • 23. Conclusion
    • Interlinear text as a pervasive data type in linguistics
      • Various tools available to create and edit
      • Outputs tied to particular implementations
    • Need for open extensible model
      • Allows reuse of interlinear text in different output formats
      • XML-based structural encoding allows for manipulation and querying

×