20 Ways to mark up a sentence Stuart Yeates New Zealand Electronic Text Centre
Who am I? <ul><li>Computer Science PhD (Waikato) by training
4 years at Oxford, working mainly in TEI
Several years at Victoria working party on TEI
Immersed in a library tradition
Actively building New Zealand / Māori content
Current member of the TEI Council </li></ul>
Who are you?
What is the TEI?  <ul><li>Consortia of people and institutions
Heavy on: </li><ul><li>Linguists (British National Corpus, dictionaries)
Medievalists (EEBO)
Dramatists / playwrights
Diarists </li></ul><li>Produce a TEI schema and range of tools </li></ul>
20 ways to mark up a sentence 1 <ul><li><p>John has a horse.</p>
New tag: <p>...</p> indicates that the text within the tag is a paragraph
This syntax is identical to that of HTML
Unlike HTML, the TEI version has specified semantics </li></ul>
20 ways to mark up a sentence 2 <ul><li><p><s>John has a horse.</s></p>
New tag: <s>...</s> indicates that the text within the tag is a sentence </li></ul>
20 ways to mark up a sentence 3 <ul><li><p>John has <phr type=&quot;noun&quot; function=&quot;object&quot;>a horse</phr>.<...
New tag: <phr>...</phr> indicates that the text within the tag is a phrase
The type and function attributes provide additional information </li></ul>
20 ways to mark up a sentence 4 <ul><li><p xml:id=&quot;bookmark-1&quot;>John has a horse.</p>
The xml:id attribute provides a unique reference for this paragraph
In combination with the URL at which the document is found, represents a globally unique identifier
Used to build replacements for “Chapter 3, page 45, line 5”
http://www.nzetc.org/tm/scholarly/tei-GorLaws-t1-g1-t1-body1-d1-d34.html#tei-GorLaws-t1-g1-t1-body1-d1-d34 </li></ul>
20 ways to mark up a sentence 5 <ul><li><p><w type=&quot;noun&quot;>John</w> <w type=&quot;verb&quot;>has</w> <w type=&quo...
New tag: <w>...</w> indicates that the text within the tag is a word
The type attribute tells us what kind of word </li></ul>
20 ways to mark up a sentence 6 <ul><li><p><name type=&quot;person&quot; key=&quot;name-0001&quot;>John</name> has a horse...
Upcoming SlideShare
Loading in …5
×

20 ways to mark up a sentence

2,018 views

Published on

Introduction to TEI/XML by examining a whole range of ways of marking up a single sentence.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,018
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20 ways to mark up a sentence

  1. 1. 20 Ways to mark up a sentence Stuart Yeates New Zealand Electronic Text Centre
  2. 2. Who am I? <ul><li>Computer Science PhD (Waikato) by training
  3. 3. 4 years at Oxford, working mainly in TEI
  4. 4. Several years at Victoria working party on TEI
  5. 5. Immersed in a library tradition
  6. 6. Actively building New Zealand / Māori content
  7. 7. Current member of the TEI Council </li></ul>
  8. 8. Who are you?
  9. 9. What is the TEI? <ul><li>Consortia of people and institutions
  10. 10. Heavy on: </li><ul><li>Linguists (British National Corpus, dictionaries)
  11. 11. Medievalists (EEBO)
  12. 12. Dramatists / playwrights
  13. 13. Diarists </li></ul><li>Produce a TEI schema and range of tools </li></ul>
  14. 14. 20 ways to mark up a sentence 1 <ul><li><p>John has a horse.</p>
  15. 15. New tag: <p>...</p> indicates that the text within the tag is a paragraph
  16. 16. This syntax is identical to that of HTML
  17. 17. Unlike HTML, the TEI version has specified semantics </li></ul>
  18. 18. 20 ways to mark up a sentence 2 <ul><li><p><s>John has a horse.</s></p>
  19. 19. New tag: <s>...</s> indicates that the text within the tag is a sentence </li></ul>
  20. 20. 20 ways to mark up a sentence 3 <ul><li><p>John has <phr type=&quot;noun&quot; function=&quot;object&quot;>a horse</phr>.</p>
  21. 21. New tag: <phr>...</phr> indicates that the text within the tag is a phrase
  22. 22. The type and function attributes provide additional information </li></ul>
  23. 23. 20 ways to mark up a sentence 4 <ul><li><p xml:id=&quot;bookmark-1&quot;>John has a horse.</p>
  24. 24. The xml:id attribute provides a unique reference for this paragraph
  25. 25. In combination with the URL at which the document is found, represents a globally unique identifier
  26. 26. Used to build replacements for “Chapter 3, page 45, line 5”
  27. 27. http://www.nzetc.org/tm/scholarly/tei-GorLaws-t1-g1-t1-body1-d1-d34.html#tei-GorLaws-t1-g1-t1-body1-d1-d34 </li></ul>
  28. 28. 20 ways to mark up a sentence 5 <ul><li><p><w type=&quot;noun&quot;>John</w> <w type=&quot;verb&quot;>has</w> <w type=&quot;particle&quot;>a</w> <w type=&quot;noun&quot;>horse</w><w type=&quot;stop&quot;>.</w></p>
  29. 29. New tag: <w>...</w> indicates that the text within the tag is a word
  30. 30. The type attribute tells us what kind of word </li></ul>
  31. 31. 20 ways to mark up a sentence 6 <ul><li><p><name type=&quot;person&quot; key=&quot;name-0001&quot;>John</name> has a horse.</p>
  32. 32. New tag: <name>...</name> indicates that the text within the tag is a name
  33. 33. The key attribute gives us the key into a foreign database to identify the named entity
  34. 34. Links to an authority control system </li></ul>
  35. 35. 20 ways to mark up a sentence 7 <ul><li><p xml:lang=&quot;en&quot;>John has a horse.</p>
  36. 36. The xml:lang attribute indicates the language of a passage
  37. 37. “en”, “mi” very common in our collections, also “la”, “sm”, “rap”
  38. 38. Dialect, regional and temporal variations can also be encoded “en_NZ”, “mi_Trad”, “mi_Modr”
  39. 39. Used for searching and analysis </li></ul>
  40. 40. 20 ways to mark up a sentence 8 <ul><li><p xml:lang=&quot;en&quot;>John has a horse called <foreign xml:lang=&quot;mi&quot;>rā</foreign>.</p>
  41. 41. New tag: <foreign>...</foreign> indicates that the text within the tag is in a foreign language
  42. 42. “Foreign” defined relative to the surrounding text not the broader context.
  43. 43. The spectrum between foreign words and loan words is complex </li></ul>
  44. 44. 20 ways to mark up a sentence 9 <ul><li><p xml:lang=&quot;en&quot; corresp=&quot;#s001&quot;>John has a horse.</p>
  45. 45. <p xml:lang=&quot;mi&quot; xml:id=&quot;s001&quot;>Nō Hone tētahi hōiho.</p>
  46. 46. The corresp attribute links together two similar passages
  47. 47. Used when there are multiple representations below the work level </li></ul>
  48. 48. 20 ways to mark up a sentence 10 <ul><li><p>John has a <add>horse</add> <del>dog</del>.</p>
  49. 49. New tag: <add>...</add> indicates that the text within the tag is an addition
  50. 50. New tag: <del>...</del> indicates that the text within the tag is a deletion </li></ul>
  51. 51. 20 ways to mark up a sentence 11 <ul><li><handNote xml:id=&quot;mary&quot;>Mary Smith</handNote> <handNote xml:id=&quot;jane&quot;>Jane Smith</handNote> ... <p>John has a <add hand=&quot;#mary&quot;>horse</add><del hand=&quot;#jane&quot;>dog</del>.</p>
  52. 52. New tag: <handNote>...</handNote> a creator in a fine-grained sense
  53. 53. The hand attribute assigns responsibility for a tag
  54. 54. The '#' character indicates a pointer to a xml:id
  55. 55. #, xml:id and xml:lang work in all modern XML </li></ul>
  56. 56. 20 ways to mark up a sentence 12 <ul><li><p>John has <unclear reason=&quot;illegible&quot; hand=&quot;#mary&quot;/> horse.</p>
  57. 57. New tag: <unclear/> indicates that the text within the tag is a unclear or uncertain
  58. 58. Responsibility can be assigned using the hand attribute </li></ul>
  59. 59. 20 ways to mark up a sentence 13 <ul><li><floatingText> <p>John has a horse.</p> </floatingText>
  60. 60. New tag: <floatingText/> indicates a text quoted or embedded within another
  61. 61. Treaty of Waitangi </li></ul>
  62. 62. 20 ways to mark up a sentence 14 <ul><li><lg><l>John has a horse,</l><l>of course, of course</l></lg>
  63. 63. New tag: <lg/> indicates a line group
  64. 64. New tag: <l/> indicates a line
  65. 65. Poetry, drama, song, liturgy, etc </li></ul>
  66. 66. 20 ways to mark up a sentence 15 <ul><li><lg><l>John has a <rhyme label=&quot;a&quot;>horse<rhyme>,</l><l>of <rhyme label=&quot;a&quot;>course<rhyme>, of <rhyme label=&quot;a&quot;>course<rhyme></l></lg>
  67. 67. New tag: <rhyme/> indicates words which are important in the rhyme scheme
  68. 68. The label attribute is used to mark which sets of words rhyme with which others
  69. 69. Can use IPA labels </li></ul>
  70. 70. 20 ways to mark up a sentence 16 <ul><li><lg met=&quot;...&quot;><l>John has a horse,</l><l>of course, of course</l></lg>
  71. 71. The attribute met indicates the metrical information for the line group </li></ul>
  72. 72. 20 ways to mark up a sentence 17 <ul><li><p><when xml:id=&quot;time-maker-9456&quot;/>John has a horse.</p>
  73. 73. New tag: <when/> is used to synchronise media, either parallel texts or with audio, video, choreography, etc, etc.
  74. 74. Can use symbolic identifiers or relative times or absolute times </li></ul>
  75. 75. 20 ways to mark up a sentence 18 <ul><li><p>John has a <g ref=&quot;#mangled-h&quot;>h</g>orse.</p>
  76. 76. New tag: <g/> indicates a glyph which is special in some way.
  77. 77. Illuminated manuscripts, misprints, custom characters, etc., etc. </li></ul>
  78. 78. 20 ways to mark up a sentence 19 <ul><li><sp><speaker>Mary</speaker> <l>John has a horse.</l></sp>
  79. 79. New tag: <sp/> indicates a passage of speech
  80. 80. New tag: <speaker/> indicates who is speaking </li></ul>
  81. 81. 20 ways to mark up a sentence 21 <ul><li><bibl><author>Mary Smith</author><title>John has a horse</title></bibl>
  82. 82. New tag: <bibl/> a bibliographic reference
  83. 83. <author/> and <title/> hopefully self explanatory
  84. 84. <biblStruct/> used when complete bibliographic information is available
  85. 85. Can be stored in the header and referenced in the document body </li></ul>
  86. 86. 20 ways to mark up a sentence 22 <ul><li><ab>John<anchor xml:id=&quot;id0034&quot;/> has <seg>a horse</seg>.</ab>
  87. 87. New tag: <ab/> is a tag like a paragraph but without the semantic baggage of a paragraph
  88. 88. New tag: <seg/> spans zero or characters but makes no semantic assumptions about hose characters
  89. 89. New tag: <anchor/> is point in the text whose marking makes no semantic assumptions </li></ul>
  90. 90. 20 ways to mark up a sentence 23 <ul><li><xinclude:include href=&quot;Johns_horse.xml&quot; />
  91. 91. New tag: <xinclude:include/> includes content from a different place
  92. 92. Local files, web downloads, RSS feeds, lifting quotes directly from original, etc.,
  93. 93. Reuse, unreasonably large files, separation of fixed and dynamic content, etc., </li></ul>
  94. 94. 20 ways to mark up a sentence 24 <ul><li><entry><form><orth>John has a horse</orth><form></entry>
  95. 95. New tags: <entry><form><orth> compilation entries
  96. 96. Dictionaries, encyclopedia, lexicons, word lists, bi-lingual dictionaries, etc., etc., </li></ul>
  97. 97. 20 ways to mark up a sentence 25 <ul><li><note place=&quot;side&quot; type=&quot;gloss&quot;>John has a horse</note>
  98. 98. New tag: <note/> used for notes
  99. 99. Footnotes, sidenotes, endnotes, etc., etc.,
  100. 100. Can be nested to arbitrary depth
  101. 101. Can be painful to render </li></ul>
  102. 102. 20 ways to mark up a sentence 26 <ul><li><char xml:id=&quot;glyph-1&quot;><charName>John has a horse</charName></char>
  103. 103. New tag: <char/> introduces a new character
  104. 104. New tag: <charName/> names new character
  105. 105. Invented languages, minority languages, bizarre print practices, etc.,
  106. 106. Can be painful to render </li></ul>
  107. 107. 20 ways to mark up a sentence 27 <ul><li><p>John has <pb corresp=”#img5.jpg”/> a horse.</p>
  108. 108. New tag: <pb/> indicates the placement of a page break, often links to page image
  109. 109. Milestone tag so it doesn't interrupt logical structure </li></ul>
  110. 110. Questions? <ul><li>Which pub are we heading to? </li></ul>

×