Introduction to Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
This stuff can get difficult. This stuff takes time to learn, practice, and patience We can only cover so much in this session, but there are further resources to consult after this session… First, Some Ground Rules
P5 Guidelines, PDF link (The current “Bible” for text encoding):  http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html P5 Guidelines, esp. Appendix C:  http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html Women Writers Project Guide to Scholarly Encoding:  http://www.wwp.brown.edu/encoding/guide/index.html  Sources to Consult
PART 1: Overview of Text Encoding
Text encoding marks up a document in XML to capture metadata (administrative, descriptive, technical, preservation) AND represent textual features important for research. Examples: The Poetess Archive Women Writers Online The Dolly Madison Digital Edition The Walt Whitman Archive What Is Text Encoding?
Quick Example <lg> <head>After <del>an</del><add>the <del>unsolv’d</del></add> argument</head> <l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their <lb/>ways and chatter, flow in <del>upon me</del></l> <l>Like <add>welcome</add> rippling water o'er my <lb>heated <add>nerves and</add> flesh.</l> </lg>
Text encoding does NOT attempt to provide one unique, authoritative version of a work. It often pairs the document with interpretation (markup and metadata) Text encoding does NOT provide one static, permanent markup for a document. While there can be alternative markup in certain instances, there  can  be incorrect markup Text encoding (TEI) is NOT meant to have an encoding recommendation for all possibilities, but rather intends to be customized and modified within TEI guidelines What Text Encoding Is NOT
To allow researchers to have access to an electronic text that does not require special-purpose software or hardware To analyze information – provide a standard text-encoding scheme  and metadata language which accommodates searching, retrieval, etc. To share information – have a standard format for data interchange in humanities research Why Do Text Encoding?
Tailor searching under specific genres (e.g. verse, drama, prose) Search different formats (e.g. chronicle, diary) Search across collections Search by mode (e.g. satire, pastoral) Search by historical or geographic period Search by title, author, and subject headings Search via structural features of text itself, including: Sections Headings Paragraphs Quotations Highlighting Footnotes Captions Text Encoding Allows Users To…
Digital libraries and digital archives Anthropology and social sciences Literary and cultural materials Scholarly editions Manuscript collections and descriptions Dictionaries Language corpora Historical documents Authoring Linguistics Who Does Text Encoding? Where Is It Found?
Technically : a standards organization for humanities text encoding  Organizationally : an international membership consortium Socially : a community of people and projects Web site:  http://www.tei-c.org/   What Is the Text Encoding Initiative (TEI)?
PART 2: Text Encoding and XML
Texts are encoded using eXtensible Markup Language (XML)  XML is… Easy to understand. Non-proprietary plain-text: Human readable Software independent Hardware independent (relatively) easy to write a parser for. Widespread: Well-supported by commercial and open-source software. Text Encoding and XML
XML Documents Must Be: Well-formed: Have no syntax errors and conform to XML code specifications <title>Little Memoirs of the Nineteenth Century</title> <author>George Paston</author> Valid: Satisfy the rules of a DTD, Schema, or RELAX NG If DTD or Schema says that author name must come before the title, then content above would be rejected
XML Vocabulary Elements, Content, Attributes, Values <titleStmt> <title type=“m”>Little Memoirs</title> Element Attribute Value Content </titleStmt> Nested <titleStmt> is PARENT ELEMENT. <title> is the CHILD ELEMENT for <titleStmt>
<biblStruct> <titleStmt>   <title level=&quot;m&quot;>Early history of the Cleveland Public Schools</title>   <author><persName>Freese, Andrew</persName></author> </titleStmt> <extent>128 p. : ill. ; 23 cm.</extent>  <publicationStmt>   <!-- groups information concerning publisher, place of publication, and date of the text -->   <pubPlace>Cleveland, Ohio</pubPlace> <publisher>Robison, Savage &amp; Co., Book Printers</publisher>   <date>1876</date>   <!-- contains a date in any format, with normalized value in the value attribute, of bibliographic item's original publication --> </publicationStmt> <notesStmt>   <note>by Andrew Freese ; Published by order of the Board of Education.</note> </notesStmt> </biblStruct> Quick Example
A valid TEI document follows the rules of a schema that describes it. The Schema (or DTD) ensures that all required elements are present in the document The schema may prevent undefined elements from being used The schema may enforce a specific data structure The schema may specify the use of attributes and define their possible values The schema may define default values for attributes An XML document can be well-formed but NOT valid An XML document can never be valid without being well-formed Validity
Schema Examples <book measure=“centimeters”>21</book>     <xs:element name=“book&quot;> <xs:complexType> <xs:simpleContent> <xs:extension base=“xs:string”> <xs:attribute name=“measure” type=“xs:string” /> </xs:extension> </xs:simpleContent> </xs:complexType>  </xs:element>   <book bookISBN=“152-32-29359535”>Go Tell It on the Mountain</book> <authorLastName>Baldwin</authorLastName> <authorFirstName>James</authorFirstName>     <xs:element name=“book&quot;>   <xs:complexType>   <xs:sequence>   <xs:element ref=“authorLastName&quot; />   <xs:element ref=“authorFirstName&quot; />   </xs:sequence>   <xs:attribute ref=“bookISBN&quot; use=&quot;required&quot; />   </xs:complexType>   </xs:element>
PART 3: Levels of TEI Encoding
Latest iteration of TEI is Protocol 5 (a.k.a. P5) Current TEI Consortium Best Practices Group (formed in Summer 2008) has been establishing best practices and standards for: TEI headers Level One encoding:  Fully Automated Conversion and Encoding Level Two:  Minimal Encoding Level Three:  Simple Analysis Level Four:  Basic Content Analysis Level Five:  Scholarly Encoding Projects The BPG will present its work at the Digital Library Federation conference in early May, get feedback, and publish a final document later in 2009 Five Levels
Level 1 Encoding:  Fully Automated Conversion and Encoding To create electronic text with the primary purpose of keyword searching and linking to page images The text is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images). Most suitable for: A large volume of material to be made available online quickly When a digital image of each page is desired No manual intervention is performed in the text creation process material is of interest to a large community of users who wish to read texts that allow keyword searching sophisticated search and display capabilities based on the structure of the text are not necessary
Level 1 Encoding: Characteristics <div1> or <div> There should be only one child of <body>: a single <div> (or <div1>) <ab> There should be only one child of the <div> (or <div1>): a single <ab> wrapping all text OCR text. If the text is ever “upgraded” to a Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>. <pb> Required in Level 1. Page images can be linked to the text by specifying a jpeg or other image file as the value of the facs= attribute. Page numbers can be supplied with the n= attribute to record the number that is on the page. The Task Force sees the use of METS here as having a tremendous advantage. METS/TEI page turning documentation will be included in the near future.
Level 2 Encoding:  Minimal Encoding To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.) The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images)  Requires some human intervention to identify each textual division and heading. Most suitable for: A large volume of material to be made available online quickly When a digital image of each page is desired material is of interest to a large community of users who wish to read texts that allow keyword searching Rudimentary search and display capabilities based on the large structures of the text are desired Each text is checked to ensure that divisions and headers are properly identified
Level 2 Encoding: Characteristics All elements specified in Level 1 plus the following: <front>, <back> Optional <div1> or <div> If no type= attribute is specified, a type= value of &quot;section&quot; should be presumed.  <head> Required if present. <ab> At least one container element is required. <fw> Running heads; can be automatically generated
<?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>  <teiHeader type=&quot;text&quot;> [stuff] </teiHeader>  <text>    <front>       [title page information, table of contents, prefaces, etc.]      [optional]    </front>    <body>      <div type=&quot;section&quot;>        <pb n=&quot;1&quot; facs=&quot; [URI of page 1 image] &quot;/>        <head> [heading of section 1] </head>        <ab> [entire contents of section 1 here, with           interspersed <pb /> elements pointing to page           images; in this example there are 26 more pages           to section 1] </ab>      </div>      <div type=&quot;section&quot;>        <pb n=&quot;27&quot; facs=&quot; [URI of page 27 image] &quot;/>        <div type=&quot;subsection&quot;>          <head >[heading of section 2 subsection 1]</head>          <ab>[all the paragraphs of subsection one go here            with page breaks inserted] </ab>        </div>      </div>    </body>    <back>  [optional]  </back>  </text> </TEI> P5 Level 2 Encoding Template
<?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <TEI xml:id=&quot;someid&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader>  [Source and processing information goes here]  </teiHeader> <text> <body> <div1> <pb n=&quot;113&quot; facs=&quot; 00000001.tif &quot;/> <head>POINT VIII: BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED.</head> <ab> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; … <pb n=&quot;114&quot; facs=&quot;00000002.tif&quot;/> on the Hiss mail in 1945, … <pb n=&quot;115&quot; facs=&quot;00000003.tif&quot;/> occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. … </ab>  </div1>  </body>  </text> </TEI>  P5 Level 2 Encoding Example
Level 3 Encoding:  Simple Analysis To create text that can stand alone as electronic text Identifies hierarchy (logical structure) and typography without content analysis being of primary importance Features to be encoded are determined by the logical structure and appearance of the text can stand alone as text without page images Most suitable for: Some sophistication of display, delivery, and searching based on structure of the text is desired Texts will be checked to ensure that encoding decisions have been made appropriately material is of interest to a large community of users who wish to read texts that allow keyword searching
Level 3 Encoding: Characteristics All elements specified in Levels 1 and 2 plus the following : <front>, <back> Required if present <div> Required if present; type attribute is recommended  <floatingText> Recommended if present. <p> Required for paragraph breaks in prose. <lg> and <l> Required for identifying groups of lines and lines, respectively <list> and <item> May be used in this level to indicate ordered and unordered list structures <table>, <row>, and <cell> May be used to indicate table structures. <figure> Required to indicate figures other than page images <hi> Required to indicate changes in typeface; rend attribute is optional <note> All notes must be encoded. It is also recommended that notes that extend beyond one page be combined into one <note> element. Marginal notes, without reference, should occur at the beginning of the paragraph to which they refer, with the value of the place attribute as &quot;margin&quot;
Level 3 Encoding:  General Recommendations Front matter <div type=&quot;contents&quot;> : Use  lists  to mark up the table of contents with the <ptr> tag used to reference the starting page number. The <ptr> tag can reference the <pb> identifier  OR  an identifier (e.g., @xml:id) placed in the corresponding division of text. Body <note>  Inline. The note is inserted at the point of reference. An n attribute records the value of the note reference if there is one Back <div type=&quot;index&quot;> : Use  lists  to mark up index entries with the <ref> tag used to reference the corresponding page number. Add the &quot;target&quot; attribute (@target) to reference the <pb> identifier to generate links from the index into the text proper Running heads, catch words, and other such forme work information should NOT be included in Level 3, with the exception of page numbers, which are recorded using pb
Level 3 Encoding:  Prose Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;>  <teiHeader> [stuff] </teiHeader> <text> <front> <div type=&quot;frontispiece&quot;>[figure]</div1>  <titlePage>[text]</titlePage>  <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front> <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;chapter“> <pb n=“5” xml:id=“freear-p03” />[text] </div2> <div type=&quot;chapter&quot;> <pb n=“12” xml:id=“freear-p12” />[text] </div2> <div type=&quot;chapter&quot;>[text]</div2> </div> </body> <back> <div type=&quot;appendix&quot;>[text]</div1> <div type=&quot;index&quot;>[text]</div1> </back> </text></TEI> Table of Contents: <!--@target references page break identifier--> <div type=&quot;contents&quot;> <head>CONTENTS</head> <list type=&quot;simple&quot;> <item>I. A Boy and His Dog <hi rend=&quot;right&quot;>3</hi> <ptr target=&quot;#freear-p03&quot;/> </item> <item>II. Romance <hi rend=&quot;right&quot;>12</hi> <ptr target=&quot;#freear-p12&quot;/> </item> </div>
Level 3 Encoding:  Verse Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;>  <teiHeader> [stuff] </teiHeader> <text> <front> <titlePage>[text]</titlePage> <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front>  <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;part&quot;> <head>[section title]</head> <div type=&quot;poem&quot;> <head>THE DAYS GONE BY.</head>   <lg> <l n=&quot;1&quot;>O the days gone by! O the days gone by!</l>   <l n=&quot;2&quot;>The apples in the orchard, and the pathway through the rye;</l> <l n=&quot;3&quot;>The chirrup of the robin, and the whistle of the quail</l> <l n=&quot;4&quot;>As he piped across the meadows sweet as any nightingale;</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </div> </div> </div>  </body>  </text>  </TEI>
Level 4 Encoding:  Basic Content Analysis To create text that can stand alone as electronic text identifies hierarchy and typography specifies function of textual and structural elements describes the nature of the content and not merely its appearance. Features of the text that may contribute to meaning, such as indentation of verse lines and typographic change, are preserved Most suitable for: sophisticated search and retrieval capabilities are desired texts will be used for textual analysis users of the texts may have limited storage or display capabilities
Level 4 Encoding: Characteristics All elements specified in Levels 1, 2 and 3 plus the following : Et cetera; see TEI BPG Guidelines <titlePage> and child elements Required if present <group> Required to encode a collection of independent texts that are regarded as a single group for processing or other purposes  <emph>, <foreign>, <gloss>, <term>, or <title> Recommended to identify typographically distinct text <epigraph>, <quote>, <said>, <mentioned>, or <soCalled> Recommended to represent speech, thought, quotation, etc. <sic>, <corr>, or <choice> Recommended to encode errors or typos. <add>, <del>, <gap>, and <unclear> Recommended to encode material that is omitted, added, marked for deletion, or is illegible, invisible, or inaudible <opener>, <dateline>, <salute> <closer>, <signed>, <postscript> Required to indicate specific parts of letters <sp>, <speaker>, and <stage> Required to encode different dramatic structures. <sp> and <speaker> Required to encode oral histories interviews
<p> But it is well authenticated by the observation of every one, that  <del rend=&quot;overstrike&quot; hand=&quot;JHL&quot;> their manner </del> <add rend=&quot;sup&quot; hand=&quot;JHL&quot;> this way—i.e. the above </add>  of writing influences the style of compos. of those who practise it considerably, when they grow up to years of manhood; for their productions,  <del hand=&quot;JHL&quot; rend=&quot;overstrike&quot;> instead </del>  far from being terse, argumentative, convincing, are without head or tail & are generally an incongruous mass mixed up in the most disgusting manner, without divisions or heads & in short without a subject (so to speak). </p> Example of Level 4 Encoding
Level 5 Encoding:  Scholarly Encoding Projects Level 5 texts are those that require subject knowledge, and encode semantic, linguistic, prosodic, or other elements beyond a basic structural level
<l>So hath myn <app> <lem wit=“#msB #msC”>herte</lem> <rdg wit=“#msA”>hert</rdg> <rdg wit=“#msD”>minde</rdg> <rdg wit=“#msE>mynde</rdg> </app> Caught in remembraunce</l> Example: Variant Readings in Level 5 Apparatus; critical apparatus Lemma, or base text
General Recommendations An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work Certain features of the text, such as publisher's advertisements or indexes, should be included as links to page images Any omissions of material found in the original work should be noted in the <editorialDecl> in the TEI header An encoding project should use only numbered divisions (i.e., <div1>, <div2>, etc.) or unnumbered divisions (i.e., <div>) but not both Whether numbered or unnumbered divisions are used, the @type attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5 Page breaks should be encoded using the <pb> element, which should demark the top of a page (i.e. the text of page seven should immediately follow <pb n=&quot;7&quot;/>), and should always be contained within a div for ease of retrieval with indexing software
PART 4: Short Practice in Text Encoding
Author:  James Wallen Title:  Cleveland’s Golden Story Publishing Place and Publisher : [Cleveland, OH]: Wm. Taylor Son & Co. Year : 1920 93 pp. CONTENTS TEXT ETC . Chapter 1.  The Kingdom of God.  1 Chapter 2. Lincoln-Hearted Men 9 Chapter 3. Taming the Wilderness 19
Chapter 1: The Kingdom of Gold Gold is the symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.” His catalog of iritations included: 1. The weather 2. The peacocks 3. His meagre grasp of  Hamlet, Prince of Denmark Chapter Heading and Paragraph
<?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader type=&quot;text&quot;> [stuff goes here] </teiHeader> <text> <front> <list> <item> Chapter 1. The Kingdom of God. 1 </item> <item> Chapter 2.  Lincoln-Hearted Men. 9 </item>  [ETC.] </list> </front> <body> <div type=“section&quot;> <pb n=“1&quot; facs=&quot;p1.jpg&quot;/> <head> The Kingdom of God </head> <ab> [a whole section is contained within this anonymous block tag; interspersed with  <pb>  elements pointing to page images] <pb xml:id=&quot;p21198-zz0002mpwb&quot; n=&quot;2&quot;/> </ab> </div> </body> <back>  [optional]  </back> </text> </TEI> P5 Level 2 Encoding
<?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader type=&quot;text&quot;>[stuff goes here]</teiHeader> <text> <front> <pb n=“1” xml:id=“walcle01-00” /> <div type=“contents”> <list> <item>Chapter 1. The Kingdom of God.  <hi rend=&quot;right&quot;>  1 </hi>   <ptr target=“#walcle01-p1”/> </item> <item>Chapter 2.  Lincoln-Hearted Men.  <hi rend=&quot;right&quot;>  9 </hi><ptr target=“# walcle01-p1”/> </item> [ETC.] </list> </div> </front> <body> <div type=“ chapter &quot;> <pb n=“1&quot; xml:id=&quot;walcle01-p1&quot;/> <head type=“main”> Chapter 1 </head> <head type=“subtitle”> The Kingdom of God </head> <p> [FIRST PARAGRAPH GOES HERE] </p> </div> </body> <back> [optional] </back>  </text></TEI> P5 Level 3 Encoding
<p> Gold is the symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.” </p> <p> His catalog of iritations included: <list> <item>  1. The weather </item> <item>  2. The peacocks  </item> <item>  3. His meagre grasp of  <hi> Hamlet, Prince of Denmark </hi>   </item> </list> </p> P5 Level 3 Continued
<p>Gold is the symbol of adventure—the unresting urge that stirs men’s souls.  <name type=“person” key=“FDO1”> Francois de Orlenna </name> , who crossed the South American continent from ocean to ocean in  <date when=“1540”> 1540 </date> , wrote,  <q> Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold. </q> </p> <p>His catalog of  <sic> iritations <sic><corr> irritations </corr>  included: <list> <item> 1. The weather</item> <item> 2. The peacocks </item> <item> 3. His meagre grasp of <hi> <bibl><title ref=“hamlet1”> Hamlet, Prince of Denmark </title></bibl> </hi> </item> </list> </p> … <bibStruct xml:id=“hamlet1”> <monogr> <author> Shakespeare, William </author> <title> Hamlet, Prince of Denmark </title> <date> </monogr> </bibStruct> P5 Level 4 Encoding
PART 5: TEI Header
Provides administrative, descriptive, and preservation metadata Administrative : who created the metadata? When was it created? Where is the original item located? Etc. Descriptive : title, author, publication info, subject headings, number of pages, etc. Preservation : file size, identifier, format, etc. TEI Header
Electronic Version Information Information about the ELECTRONIC version of the work(s) Electronic Distributor Information Information about the publisher of the ELECTRONIC version of the work(s) E.g. William Taylor & Co. published the original work, but Kelvin Smith Library is publishing the electronic version Original Document Bibliographic Information Bibliographic information of the text from which the electronic version was derived. May be generated from MARC record (but does  not  have to be). Encoding Description Includes project description, encoding level declaration, what classification structure is used (e.g. LCSH), etc. Profile Description Includes text language, subject terms Revision Description If any revision was done to the TEI document, this is where that information is recorded, included revision details, party(ies) involved, and date(s) Basic Components of TEI Header
Can reflect a text center’s standards, serve as the basis for other types of metadata system records,  Can function in detached form as records in a catalog, as a title page inherent to the document, or as a source for index displays May describe a collection of documents, a single item, or a portion of an item A TEI header may NOT necessarily have a one to one correspondence with a MARC record. One TEI header may have multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers May contain an historical background on how the file has been treated and extend the information of a classic catalog record There is no ONE header template. Modification needed depending on project, text type. TEI Header (continued)
Example: MARC to TEI Header LEADER 00000nam 2200000Ia 4500 001 49237829 003 OCoLC 005 20020305071435.0 008 020305s1905 ohu r 000 0 eng d 040 CWR|cCWR 049 CWRR 090 BJ1161|b.G6 1905a 100 1 Given, Charles Stewart 245 12 A fleece of gold :|bfive lessons from the fable of Jason and the Golden Fleece /|cby Charles Stewart Given 260 Cincinnati [Ohio] :|bJennings and Graham ;|aNew York [N.Y.] :|bEaton and Mains,|cc1905 300 103 p. ;|c18 cm 533 Photocopy.|bLaCrosse, Wis. :|cBrookhaven Press : digital production by Northern Micrographics, Inc.,|d2001.|e18 cm 650 0 Success 650 0 Conduct of life 650 0 Jason (Greek mythology)  <sourceDesc> <biblStruct> <titleStmt> <title type=&quot;main&quot;>A fleece of gold</title> <title type=&quot;sub&quot;>five lessons from the fable of Jason and the Golden Fleece</title> <!-- subheading [if applicable] --> <author> <persName>Given, Charles Stewart <persName> </author> </titleStmt> <extent>103 p.</extent> <publicationStmt> <!-- groups information concerning publisher, place of publication, and date of the text --> <pubPlace>Cincinnati [Ohio]</pubPlace> <publisher>Jennings and Graham</publisher> <date>1905</date> <idno>BJ1161 .G6 1905a</idno> </publicationStmt> </biblStruct> </sourceDesc> … <profileDesc> <keywords scheme=&quot;LCSH&quot;> <!--  if the keywords come from a controlled vocabulary, it can be identified by the scheme attribute --> <term>Success</term> <term>Conduct of life</term> <term>Jason (Greek mythology)</term> </keywords> </textClass> </profileDesc>
Session 2: Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
PART 6: Basic Text Type Structures
Before encoding a text, skim through it and mark out its structure. For books, take note of: Volumes  Parts Chapters  Section breaks Table of contents End matter (indices, endnotes, bibliography, glossary, appendices, colophon, etc.) Front matter (title page, dedication, epigraph, preface, introduction, etc.) Introduction
Sample Book Structure This book has: A. An Introduction, B. Two chapters – each with a heading  and two sections – and  C. an Index
Book: Tree Diagram
Basic Document Structure <text> <front> <div type=&quot;preface&quot;> <!-- ... --> </div> <div type=&quot;introduction&quot;> <!-- ... --> </div> </front> <body> <div type=“chapter” n=“1”> <heading type =“main”>Chapter 1</heading> <heading type=“subtitle”>Wines</heading> <div type=“section”>White wines ... </section> <div type=“section”>Red wines ... </section> </div> </body> <back> <div type=&quot;index&quot;> <!-- … --> </div> </back> </text>  Un-numbered Division
Numbered Divisions within Body <body>   <div1 type=&quot;part&quot; n=&quot;1“>   <div2 type=&quot;chapter&quot; n=&quot;1“>   <!-- text of part 1, chapter 1 -->   </div2>   </div1>     <div1 type=&quot;part&quot; n=&quot;2“>   <div2 type=&quot;chapter&quot; n=&quot;2“>   <!-- text of part 2, chapter 2 -->   </div2>   </div1> </body>  The largest possible subdivision of the body is <div1> and the smallest possible is <div7>. You  CANNOT mix unnumbered and numbered divisions (i.e. <div> with <div1> etc.) Regardless, a good practice is to use n= and/or xml:id= attributes. For example: <div type=“chapter” n=“1”>     NOTE: n= is optional, and repeatable
<text> <front><!-- ... --></front> <group> <text><!-- ... --></text> <text><!-- ... --></text> <text><!-- ... --></text> </group> <back><!-- ... --></back> </text> Using <group> for Editions
<div1 n=“I” type=“Act”> <head> Act I </head> <div2 type=&quot;scene&quot;> <head> Scene 1 </head> <stage type=&quot;entrance&quot;> Enter Fay </stage>  <sp> <speaker> Fay </speaker> <p> I say, Dinah, has anyone seen my gloves? </p> </sp> <stage type=&quot;entrance&quot;> Enter Dinah </stage> <sp> <speaker> Dinah </speaker> <p> No, miss, perhaps the parakeet has got them again? </p> </sp> <stage type=&quot;exit&quot;> Exit Fay and Dinah </stage> </div2> </div1> Drama
<div type=&quot;letter&quot;> <opener> <dateline> <date when=&quot;1865-08-05&quot;> August the 5th </date> <name type=&quot;place&quot;> Cape Cod </name> </dateline> <salute> My dear  <name type=&quot;person&quot;> Becky </name></salute> </opener> <p> How lovely the oysters are this evening! </p> <closer> <salute> Yours very truly </salute> <signed><name type=&quot;person&quot;> Maria </name></signed> </closer> </div> Letters
<lg type=&quot;sonnet&quot;> <head>On First Looking into Chapman’s Homer</head> <lg type=&quot;quatrain&quot;> <l>Much have I travell’d in the realms of gold,</l> <l>And many goodly states and kingdoms seen;</l> <l>Round many western islands have I been</l> <l>Which bards in fealty to <persName>Apollo</persName>  hold.</l> </lg> <lg type=&quot;quatrain&quot;> <l>Oft of one wide expanse had I been told</l> <l>That deep-brow’d <persName>Homer</persName> ruled as  his demesne;</l> <l>Yet did I never breathe its pure serene</l> <l>Till I heard <persName>Chapman</persName> speak out  loud and bold:</l> </lg> <lg type=&quot;sestet&quot;> <l>Then felt I like some watcher of the skies</l> <l>When a new planet swims into his ken;</l> <l>Or like stout <persName>Cortez</persName> when  with eagle eyes</l> <l>He star’d at the <placeName>Pacific</placeName>—and  all his men</l> <l>Look’d at each other with a wild surmise&mdash;</l> <l>Silent, upon a peak in <placeName>Darien</placeName>.</l> </lg>  </lg> Verse <persName> and  <placeName> are  optional for Level 3; Used in Level 4.
Warp Speed, Ms. Bright! There was a young lady named Bright, Who travelled much faster than light, She departed one day, In a relative way way, And returned on the previous night. Verse Example
<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <lg type=&quot;limerick&quot; rhyme=&quot;aabba&quot; n=&quot;3&quot;> <head>Warp Speed, Ms Bright!</head> <l>There was a young lady named <rhyme label=&quot;a&quot;>Bright</rhyme>,</l> <l>Who travelled much faster than <rhyme label=&quot;a&quot;>light</rhyme>,</l> <l>She departed one <rhyme label=&quot;b&quot;>day</rhyme>,</l> <l>In a relative <rhyme label=&quot;b&quot;>way</rhyme>,</l> <l>And returned on the previous <rhyme label=&quot;a&quot;>night</rhyme>.</l> </lg> TEI P5 Level 5 Rendering
PART 7: Some Common Practices in Text Encoding
Make sure you are linked to a schema <?oxygen RNGSchema=&quot; http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng &quot; type=“xml&quot;?> 2. Include as much of the header elements as possible – including  <editorialDecl> <editorialDecl> <hyphenation eol=&quot;none&quot;>   <p>Hyphenated words that appear at the end of a line have been removed</p> </hyphenation> </editorialDecl> 3. Make use of spell-check: F4 key 4. Delete hyphens within all words – except in special cases (e.g. poetry, dramatic verse) Institu-tion becomes Institution Using oXygen
5. Include  <respStmt> in header for any emendations or corrections you later make to a text that has been previously encoded <respStmt> <name xml:id=&quot;rlw54&quot;>Richard Wisneski</name> <!--  one resp per respStmt --> <resp>TEI Header creator</resp> <!-- OR TEI Header and document creator -->  </respStmt> 6. Page breaks must be inserted, using <pb n=“[page number]” xml:id=“” /> <pb xml:id=&quot;fleboo-032&quot; n=&quot;33&quot;/> OR, if one desires to have page reference the specific page image: <pb facs=“fleboo-032.jp2“ n=&quot;33&quot;/> xml:id attribute MUST be (a) unique and (b) start with a letter facs attribute MAY link to a permanent URL or URI Common Practices (continued)
As part of a page: <figure> <figDesc>An illuminated page from the de Brailes Hours, containing a historiated initial with a signed self-portrait of William de Brailes </figDesc> <graphic url=&quot;./gfx/debrailes_ms.jpg&quot; height=&quot;600px&quot;/> </figure> Inserting Images <figDesc> is not required in Level 3, but we are using it to capture either an image caption or to describe the image if a caption is not present
Footnotes
<p>FIRST ANNUAL REPORT<note place=&quot;foot&quot; rend=&quot;*&quot;>This was the first report made after the schools were regularly organized under the ordinance. The Bethel School mentioned in the opening paragraph had existed through a part of the previous year, the year 1836, and a Board appointed to look after its interests, had made an informal— probably oral — report.</note> OF THE- BOARD OF MANAGERS OF COMMON SCHOOLS.</p> Footnote Encoded (and Marginalia) If this note were in the MARGIN of the page, it would be encoded, for example: <note type=“auth” place=“margin-left”> text, text,text </note> Type= and rend= attributes are optional
In the body of the text: <p>He liked to eat pie<ptr target=&quot;#note1&quot; rend=&quot;1&quot;/>.</p> OR… <p>See <ref target=&quot;#note1&quot;>Note 30</ref></p> At the end of the text: <div type=&quot;endnotes&quot;>   <pb n=&quot;130&quot; />   <p xml:id=“note1&quot;>Pie is a dessert often eaten.</p>   </div> Endnotes
Another Method
<l>(Diff'rent our parties, but with equal grace</l> <l>The Goddess smiles on Whig and Tory race, <ptr rend=&quot;unmarked&quot; target=&quot;#note3.284&quot;/> </l> <l>'Tis the same rope at sev'ral ends they twist,</l> <l>To Dulness, Ridpath is as dear as Mist)</l> <note   xml:id=&quot;note3.284&quot;   type=&quot;imitation&quot;   place=&quot;foot&quot;   anchored=&quot;false&quot;>   <bibl>Virg. Æn. 10.</bibl>   <quote>   <l>Tros Rutulusve fuat; nullo discrimine habebo.</l>    <l>—— Rex Jupiter omnibus idem.</l>   </quote> </note> Encode with Pointer and Link
PART 8: Encoding References
Definition : Things we know about the content of the text that we want to be able to state explicitly to add value to the text or assist the reader in understanding it better, such as: Authority control: information about the identity of things named in the text: people, places, books, etc. Additional information about: birthdates, geographical locations, date published, etc. Interpretive information: themes, keywords Normalization of measurements, dates, etc.  Encoding Contextual Information
Names <persName>Baron Olivier of Brighton</persName>  <placeName>New York</placeName> <orgName>Podunk Sewing Club</orgName>   Linguistic: <foreign>, <distinct>, <soCalled>, <mentioned>, <term>, <emph> <distinct>dinna ken</distinct> why that <foreign xml:lang=&quot;fr&quot;>soi-disant</foreign> <soCalled>expert</soCalled> must be <emph>so</emph> particular about pronouncing <mentioned xml:lang=&quot;cy&quot;>Llandaff</mentioned> using a <term>voiceless lateral fricative</term> Common Tags for Contextual Information
’ Ographies: prosopography (personography) gazetteers (placeography) orgography, bibliography These are like local authority lists that you create Keywords applied to the text as a whole Thematic or interpretive information applied to specific places in the text Types of Contextual Information
Like a local name authority file Can be simple or very detailed Can be kept in your encoded file or externally Includes specific elements for the most common data Also includes general elements for the unforeseen Personography
<teiHeader> <!-- ... --> <particDesc> <listPerson> <person xml:id=&quot;andrew_j_steere&quot;> <persName>Steere, Andrew J.</persName> <birth when=&quot;1844&quot;> <placeName ref=&quot;#l_scituate&quot;>Scituate, RI</placeName> </birth> <death notBefore=&quot;1918&quot;/> </person> <person xml:id=&quot;george_pope_morris&quot;> <persName>Morris, George Pope</persName> <birth when=&quot;1802&quot;> <placeName>Philadelphia, PA</placeName> </birth> <death when=&quot;1864&quot;/> </person> </listPerson> </particDesc> </teiHeader> <text> <p>...However, the plea of Woodman spare that tree and the patriotic pride  of the owner, <persName ref=&quot;#andrew_j_steere&quot;>Mr. Andrew J. Steere</persName>,  had guaranteed its safety from the woodsman’s axe. </p> </text>   Personography Encoding TEI header Participation description listPerson person
Very similar to personography...but for places Can be linked to maps via geographic information data Placeography (Gazetteer)
<body> <p>The tree stood about a mile east of <placeName ref=&quot;#l_chepachet&quot;>Chepachet</placeName> and a mile north of <placeName ref=&quot;#l_spring_grove&quot;>Spring Grove</placeName> ... </p> <!-- ... --> </body> <back> <div type=&quot;editorial&quot;> <listPlace> <place type=&quot;state&quot; xml:id=&quot;l_rhode_island&quot;> <placeName>The State of Rhode Island and Providence Plantations</placeName> <country>United States of America</country> <region>New England</region> </place> <place type=&quot;settlement&quot; xml:id=&quot;l_chepachet&quot;> <placeName>Chepachet</placeName> <region ref=&quot;#l_rhode_island&quot;/> <location> <geo>41.915131 -71.671397</geo> </location> </place> <place type=&quot;settlement&quot; xml:id=&quot;l_spring_grove&quot;> <placeName>Spring Grove</placeName> <region ref=&quot;#l_rhode_island&quot;/> <location> <geo>41.905583 -71.656219</geo> </location> </place> </listPlace> </div> </back> Placeography Encoding back div place
To associate a keyword or interpretive concept with a word, phrase, or passage of text:  <body> <div type=&quot;section&quot;> <p>However, the plea of <quote>Woodman spare that tree</quote> and the <seg ana=&quot;#patriotism&quot;>patriotic pride of the owner</seg>, <persName>Mr. Andrew J. Steere</persName>, had <seg ana=&quot;#conservation&quot;>guaranteed its safety from the woodsman’s axe</seg>...</p> </div> </body> <back> <div type=&quot;editorial&quot;> <interpGrp> <interp xml:id=&quot;ri_history&quot;>Rhode Island local history</interp> <interp xml:id=&quot;patriotism&quot;>Patriotism and references to the war effort</interp> <interp xml:id=&quot;commercial&quot;>References to commercial harvesting and use of trees</interp> <interp xml:id=&quot;conservation&quot;>Conservation efforts and protection of trees</interp> <interp xml:id=&quot;arboriculture&quot;>References to tree species and their cultivation</interp> </interpGrp> </div> </back> Interpretative Keywords and Themes
PART 9: Some Complex Situations in Text Encoding
Overlapping occurs especially with older texts. XML elements may not overlap, but document structures often do. Examples include: Physical features like pages, columns, and lines, and textual things like paragraphs or names Verse lines and quotations, names, other phrasal elements Verse lines and linguistic features Dramatic speeches and verse lines Handwritten additions or deletions and other structures Typographical features and linguistic features Overlapping
Mortal, she said, &quot;I'm sent to you, Then hold my precepts fast; Remember earth’s best joys are few, And can’t for ever last.&quot; Example
<lg type=&quot;stanza&quot;> <l>Mortal, she said, <said xml:id=&quot;s01&quot; next=&quot;#s02&quot;>I'm sent to you,</said></l> <l><said xml:id=&quot;s02&quot; next=&quot;#s03&quot; prev=&quot;#s01&quot;>Then hold my precepts fast;</said></l> <l><said xml:id=&quot;s03&quot; prev=&quot;#s02&quot; next=&quot;#s04&quot;>Remember earth’s best joys are few,</said></l> <l><said xml:id=&quot;s04&quot; prev=&quot;#s03&quot;>And can’t for ever last.</said></l> </lg> One Possible Solution
Transcriptional Complexities <p>Johnston etc 1764 Mr Nikl<unclear>e</unclear> <supplied>s</supplied><gap reason=&quot;folded&quot; extent=&quot;unknown&quot;/> Brown  <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear>  to ten Rum Barels at Four pound &per; Barel — — —  £40</p>
Marks a boundary point separating any kind of section of a text where the change is not represented by a structural element   Regarding page breaks, signatures, line breaks, and column breaks Example: <pb n=&quot;249&quot;/> <milestone unit=&quot;sig&quot; n=&quot;R5r&quot;/> <lb/>digested. Its long trunk, as seen slanting down from <lb/>out of the building across the wharf and into the ship, … <lb/>will have been—a farthing.</p> <pb n=&quot;250&quot;/> <milestone unit=&quot;sig&quot; n=&quot;R5v&quot;/> Milestones
Available on every element when additional tagset for segmentation & alignment is used To count lines, or indicate metrical patterns, etc. --> divide into parts <sp> <speaker>Leo.</speaker> <l part=&quot;F&quot;>Go on, go on:</l> <l>Thou canst not speake too much, I have deserv’d</l> <l part=&quot;I&quot;>All tongues to talk their bittrest.</l> </sp> <sp> <speaker>Lord.</speaker> <l part=&quot;F&quot;>Say no more;</l> <l>How ere the business goes, you have made fault</l> <l part=&quot;I&quot;>I’th boldnesse of your speech.</l> </sp> <sp> <speaker>Pauline.</speaker> <l part=&quot;F&quot;>I am sorry for’t;</l> <l>All faults I make, when I shall come to know them</l> <!-- ... --> </sp> Fragmentation
Parallelism
Encoding Parallel Structures <lg type=&quot;stanza&quot; xml:lang=&quot;fr&quot;> <l xml:id=&quot;fr2.01&quot; corresp=&quot;#en2.01&quot;>Nos péchés sont têtus, nos repentirs sont lâches;</l> <l xml:id=&quot;fr2.02&quot; corresp=&quot;#en2.02&quot;>Nous nous faisons payer grassement nos aveux,</l> <l xml:id=&quot;fr2.03&quot; corresp=&quot;#en2.04&quot;>Et nous rentrons gaiement dans le chemin bourbeux,</l> <l xml:id=&quot;fr2.04&quot; corresp=&quot;#en2.03&quot;>Croyant par de vils pleurs laver toutes nos taches.</l> </lg> <lg type=&quot;stanza&quot; xml:lang=&quot;en&quot;> <l xml:id=&quot;en2.01&quot; corresp=&quot;#fr2.01&quot;>Our sins are stubborn, craven our repentance.</l> <l xml:id=&quot;en2.02&quot; corresp=&quot;#fr2.02&quot;>For our weak vows we ask excessive prices.</l> <l xml:id=&quot;en2.03&quot; corresp=&quot;#fr2.04&quot;>Trusting our tears will wash away the sentence,</l> <l xml:id=&quot;en2.04&quot; corresp=&quot;#fr2.03&quot;>We sneak off where the muddy road entices.</l> </lg>
<p>...with them, bycause they woulde <lb/>not be  <choice> <abbr>boūde</abbr> <expan>bounde</expan> </choice>  also for an other wo[see below] <lb/>mā at theyr pleasure, whom they <lb/>knewe not, nor yet what matter <lb/>was layed unto her charge. Not <lb/>wythstandynge at the laste, after <lb/>moche a do and reasonyng to and <lb/>fro, they toke a bonde of them of <lb/>recognisaunce for my fourth com <lb/>mynge. And thus I was at the <lb/>last,  <choice> <orig>delyuered</orig> <reg>delyvered</reg> </choice>.  Written by me An <lb/>ne Askewe. </p> Textual Splitting CHOICE
Textual Splitting (continued) … <choice> <abbr> <choice> <sic>wo<lb/>mā</sic> <corr>wo-<lb/>mā</corr> </choice> </abbr> <expan> <choice> <sic>wo<lb/>man</sic> <corr>wo-<lb/>man</corr> </choice> </expan> </choice> also for an other wo[see below] <lb/>mā at theyr pleasure, whom they You can put CHOICE inside name, or outside (nesting)
For an unclear reading, you can put multiple clear elements within choice Use <unclear>, <supplied>, and <gap> element for what you can't read from the written manuscript Example: Brown <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear>  Unclear Texts
<teiHeader> <fileDesc> <!-- ... --> <notesStmt> <note>The historiated capital on folio 43r contains  a signed self-portrait of William de Brailes. The  text is richly illuminated ...</note> </notesStmt> <!-- ... --> </fileDesc> </teiHeader> Descriptive Prose for Images
Formalization for Images <physDesc> <objectDesc form=&quot;codex&quot;> <supportDesc material=&quot;vellum&quot;> <extent> <dimensions> <height quantity=&quot;150&quot; unit=&quot;mm&quot;/> <width quantity=&quot;124&quot; unit=&quot;mm&quot;/> </dimensions> </extent> </supportDesc> <layoutDesc> <layout columns=&quot;1&quot; ruledLines=&quot;12&quot;/> </layoutDesc> </objectDesc> <handDesc> <handNote scribe=&quot;william_debrailes&quot; script=&quot;carolingian&quot; medium=&quot;ink&quot; scope=&quot;sole&quot;/> </handDesc> </physDesc> practice
PART 10: TEI Guidelines
The outward expression of TEI to the user is the TEI guidelines:  http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html These guidelines include schemas – formal rules for encoding documents – and documentation to explain how to apply these rules Customization is possible to apply TEI markup as strict or as loose as one wishes TEI Guidelines = Your instruction manual TEI Guidelines
Topically  – into chapters and modules Semantically and functionally  – into classes Pedagogically  – into discussion and reference TEI provides the tools. You have to ask: What is my project? What are my needs? What level of granularity do I want? TEI Guidelines Divisions
Some modules (e.g. TEI header) are  required  – core elements Some modules provide  genre- or discipline-specific elements : e.g., the verse module contains <rhyme> and <caesura> Some modules provide  functionality : e.g., the linking module contains markup for encoding arbitrary passages and noting links between them Chapters and Modules
Classes are functional or semantic groups of elements. Two kinds:  model class:  elements that appear in the same place in the logical structure e.g., model.milestoneLike contains those milestone elements used to describe the physical and typographic structure of a codex: pb, cb, lb, and milestone attribute class:  elements that share a common attribute e.g.,  att.ascribed is the class of elements that share the who attribute, i.e. can be attributed to an individual, including change, q, and sp Classes
Classes are composed of elements and sometimes other classes: elements and attribute classes can be members of attribute classes a given element (or attribute class) may be a member of more than one attribute class elements, model classes, or attribute classes can be members of model classes a given element (or class) may be a member of more than one model class Class Constituency
Datatypes are constraints on (attribute) values. May limit values to things like:  a supplied list of possible values: e.g. sex of person a user-provided list of possible values: e.g. type of div strings that conform to a specific format: e.g. when of date, extent of gap, or xml:lang Datatypes
contain a prose discussion of: Encoding topics, A reference section, and Schemas that constrain encoding One Document Does It All (ODD) TEI Guidelines…
PART 11: Schema
TEI Under the hood http://www.tei-c.org/Roma/
Roma , the web interface to the TEI customization mechanism, performs two functions:  Provides a user interface for editing TEI ODD files Provides a user interface for generating TEI schemas and documentation ROMA, The Web Tool
Select which modules to use and not to use Choose what elements in those modules to use and not to use Choose what attributes on those elements to use and not to use Change element or attribute names Restrict the values of attributes – IMPORTANT  Constrain structure Add new elements Add new attributes Produce an internationalized version of the TEI Customization Options
Customize: Values for Type= Attributes Values for any kind of information for which a controlled vocabulary exists However: Do NOT remove the header Do NOT remove required elements from the header Do NOT duplicate things which the TEI has already provided What To Customize and Not Customize
Click Submit
STEP 1:  Enter a (1) title, (2) filename (all one word, no characters), (3) author name, and  (4) description. Leave everything else as is. STEP 2:  Click “submit” STEP 3:  Click the MODULES tab MODULES tab
Do NOT remove the “List of selected Modules” given to you (but click link to change attributes) Click links under “Module Name” to: a. See what elements are available b. Exclude elements c. Rename elements d. Change attributes 3. To change values for attributes, click the attribute itself. 4. Click “add” on left-hand side to add elements with their attributes
Clicking “add” inserts this module here. Next, click “namedates”
You can exclude elements Click to change attributes
Click on these links to change the values for the attributes When you’re done changing elements and attributes, click here
Change the values for the attributes (type= ) Click if no changes are made (bypass  “ submit query”) Click when finished
Return to your list; note that if you changed any attributes, the “changes” column indicates this. Click the “Schema” tab
Best is to choose either “Relax NG schema (compact syntax) or “W3C schema” Click “Documentation” to save documentation for this schema Click “Sanity Checker” to make sure your schema validates NOTE: You can return to the Roma tool to later edit your schema
Conclusion
More and better documentation More use (and support for use) by individuals More discipline-specific customizations Future Trends in TEI
Historical Event Markup Language (HEML):  http://www.heml.org/heml-cocoon/ Music Markup Language:  http://www.musicmarkup.info/ Multi-Element Coding System:  http://helmer.hit.uib.no/claus/mecs/mecs.htm Other Encoding Possibilities
WWP Guide to Scholarly Text Encoding:  http://www.wwp.brown.edu/encoding/guide/index.html TEI web site:  http://www.tei-c.org/index.xml The TEI listserv (TEI-L) TEI Wiki:  http://www.tei-c.org/wiki/index.php/Main_Page Teach Yourself TEI:   http://www.tei-c.org/Support/Learn/tutorials.xml   Guidelines for Text Encoding and Interchange:   http://quod.lib.umich.edu/t/tei/ A Gentle Introduction to XML:   http://www.tei-c.org/release/doc/tei-p4-doc/html/SG.html A Companion to Digital Literary Studies:   http://www.digitalhumanities.org/companion/DLS/ References

Xml Case Learns 2008

  • 1.
    Introduction to TextEncoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
  • 2.
    This stuff canget difficult. This stuff takes time to learn, practice, and patience We can only cover so much in this session, but there are further resources to consult after this session… First, Some Ground Rules
  • 3.
    P5 Guidelines, PDFlink (The current “Bible” for text encoding): http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html P5 Guidelines, esp. Appendix C: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html Women Writers Project Guide to Scholarly Encoding: http://www.wwp.brown.edu/encoding/guide/index.html Sources to Consult
  • 4.
    PART 1: Overviewof Text Encoding
  • 5.
    Text encoding marksup a document in XML to capture metadata (administrative, descriptive, technical, preservation) AND represent textual features important for research. Examples: The Poetess Archive Women Writers Online The Dolly Madison Digital Edition The Walt Whitman Archive What Is Text Encoding?
  • 6.
    Quick Example <lg><head>After <del>an</del><add>the <del>unsolv’d</del></add> argument</head> <l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their <lb/>ways and chatter, flow in <del>upon me</del></l> <l>Like <add>welcome</add> rippling water o'er my <lb>heated <add>nerves and</add> flesh.</l> </lg>
  • 7.
    Text encoding doesNOT attempt to provide one unique, authoritative version of a work. It often pairs the document with interpretation (markup and metadata) Text encoding does NOT provide one static, permanent markup for a document. While there can be alternative markup in certain instances, there can be incorrect markup Text encoding (TEI) is NOT meant to have an encoding recommendation for all possibilities, but rather intends to be customized and modified within TEI guidelines What Text Encoding Is NOT
  • 8.
    To allow researchersto have access to an electronic text that does not require special-purpose software or hardware To analyze information – provide a standard text-encoding scheme and metadata language which accommodates searching, retrieval, etc. To share information – have a standard format for data interchange in humanities research Why Do Text Encoding?
  • 9.
    Tailor searching underspecific genres (e.g. verse, drama, prose) Search different formats (e.g. chronicle, diary) Search across collections Search by mode (e.g. satire, pastoral) Search by historical or geographic period Search by title, author, and subject headings Search via structural features of text itself, including: Sections Headings Paragraphs Quotations Highlighting Footnotes Captions Text Encoding Allows Users To…
  • 10.
    Digital libraries anddigital archives Anthropology and social sciences Literary and cultural materials Scholarly editions Manuscript collections and descriptions Dictionaries Language corpora Historical documents Authoring Linguistics Who Does Text Encoding? Where Is It Found?
  • 11.
    Technically : astandards organization for humanities text encoding Organizationally : an international membership consortium Socially : a community of people and projects Web site: http://www.tei-c.org/ What Is the Text Encoding Initiative (TEI)?
  • 12.
    PART 2: TextEncoding and XML
  • 13.
    Texts are encodedusing eXtensible Markup Language (XML) XML is… Easy to understand. Non-proprietary plain-text: Human readable Software independent Hardware independent (relatively) easy to write a parser for. Widespread: Well-supported by commercial and open-source software. Text Encoding and XML
  • 14.
    XML Documents MustBe: Well-formed: Have no syntax errors and conform to XML code specifications <title>Little Memoirs of the Nineteenth Century</title> <author>George Paston</author> Valid: Satisfy the rules of a DTD, Schema, or RELAX NG If DTD or Schema says that author name must come before the title, then content above would be rejected
  • 15.
    XML Vocabulary Elements,Content, Attributes, Values <titleStmt> <title type=“m”>Little Memoirs</title> Element Attribute Value Content </titleStmt> Nested <titleStmt> is PARENT ELEMENT. <title> is the CHILD ELEMENT for <titleStmt>
  • 16.
    <biblStruct> <titleStmt> <title level=&quot;m&quot;>Early history of the Cleveland Public Schools</title> <author><persName>Freese, Andrew</persName></author> </titleStmt> <extent>128 p. : ill. ; 23 cm.</extent> <publicationStmt> <!-- groups information concerning publisher, place of publication, and date of the text --> <pubPlace>Cleveland, Ohio</pubPlace> <publisher>Robison, Savage &amp; Co., Book Printers</publisher> <date>1876</date> <!-- contains a date in any format, with normalized value in the value attribute, of bibliographic item's original publication --> </publicationStmt> <notesStmt> <note>by Andrew Freese ; Published by order of the Board of Education.</note> </notesStmt> </biblStruct> Quick Example
  • 17.
    A valid TEIdocument follows the rules of a schema that describes it. The Schema (or DTD) ensures that all required elements are present in the document The schema may prevent undefined elements from being used The schema may enforce a specific data structure The schema may specify the use of attributes and define their possible values The schema may define default values for attributes An XML document can be well-formed but NOT valid An XML document can never be valid without being well-formed Validity
  • 18.
    Schema Examples <bookmeasure=“centimeters”>21</book>  <xs:element name=“book&quot;> <xs:complexType> <xs:simpleContent> <xs:extension base=“xs:string”> <xs:attribute name=“measure” type=“xs:string” /> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <book bookISBN=“152-32-29359535”>Go Tell It on the Mountain</book> <authorLastName>Baldwin</authorLastName> <authorFirstName>James</authorFirstName>  <xs:element name=“book&quot;> <xs:complexType> <xs:sequence> <xs:element ref=“authorLastName&quot; /> <xs:element ref=“authorFirstName&quot; /> </xs:sequence> <xs:attribute ref=“bookISBN&quot; use=&quot;required&quot; /> </xs:complexType> </xs:element>
  • 19.
    PART 3: Levelsof TEI Encoding
  • 20.
    Latest iteration ofTEI is Protocol 5 (a.k.a. P5) Current TEI Consortium Best Practices Group (formed in Summer 2008) has been establishing best practices and standards for: TEI headers Level One encoding: Fully Automated Conversion and Encoding Level Two: Minimal Encoding Level Three: Simple Analysis Level Four: Basic Content Analysis Level Five: Scholarly Encoding Projects The BPG will present its work at the Digital Library Federation conference in early May, get feedback, and publish a final document later in 2009 Five Levels
  • 21.
    Level 1 Encoding: Fully Automated Conversion and Encoding To create electronic text with the primary purpose of keyword searching and linking to page images The text is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images). Most suitable for: A large volume of material to be made available online quickly When a digital image of each page is desired No manual intervention is performed in the text creation process material is of interest to a large community of users who wish to read texts that allow keyword searching sophisticated search and display capabilities based on the structure of the text are not necessary
  • 22.
    Level 1 Encoding:Characteristics <div1> or <div> There should be only one child of <body>: a single <div> (or <div1>) <ab> There should be only one child of the <div> (or <div1>): a single <ab> wrapping all text OCR text. If the text is ever “upgraded” to a Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>. <pb> Required in Level 1. Page images can be linked to the text by specifying a jpeg or other image file as the value of the facs= attribute. Page numbers can be supplied with the n= attribute to record the number that is on the page. The Task Force sees the use of METS here as having a tremendous advantage. METS/TEI page turning documentation will be included in the near future.
  • 23.
    Level 2 Encoding: Minimal Encoding To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.) The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images) Requires some human intervention to identify each textual division and heading. Most suitable for: A large volume of material to be made available online quickly When a digital image of each page is desired material is of interest to a large community of users who wish to read texts that allow keyword searching Rudimentary search and display capabilities based on the large structures of the text are desired Each text is checked to ensure that divisions and headers are properly identified
  • 24.
    Level 2 Encoding:Characteristics All elements specified in Level 1 plus the following: <front>, <back> Optional <div1> or <div> If no type= attribute is specified, a type= value of &quot;section&quot; should be presumed. <head> Required if present. <ab> At least one container element is required. <fw> Running heads; can be automatically generated
  • 25.
    <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?><TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>  <teiHeader type=&quot;text&quot;> [stuff] </teiHeader>  <text>    <front>       [title page information, table of contents, prefaces, etc.]      [optional]    </front>    <body>      <div type=&quot;section&quot;>        <pb n=&quot;1&quot; facs=&quot; [URI of page 1 image] &quot;/>        <head> [heading of section 1] </head>        <ab> [entire contents of section 1 here, with           interspersed <pb /> elements pointing to page           images; in this example there are 26 more pages           to section 1] </ab>      </div>      <div type=&quot;section&quot;>        <pb n=&quot;27&quot; facs=&quot; [URI of page 27 image] &quot;/>        <div type=&quot;subsection&quot;>          <head >[heading of section 2 subsection 1]</head>          <ab>[all the paragraphs of subsection one go here            with page breaks inserted] </ab>        </div>      </div>    </body>    <back> [optional] </back>  </text> </TEI> P5 Level 2 Encoding Template
  • 26.
    <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?><TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <TEI xml:id=&quot;someid&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader> [Source and processing information goes here] </teiHeader> <text> <body> <div1> <pb n=&quot;113&quot; facs=&quot; 00000001.tif &quot;/> <head>POINT VIII: BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED.</head> <ab> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; … <pb n=&quot;114&quot; facs=&quot;00000002.tif&quot;/> on the Hiss mail in 1945, … <pb n=&quot;115&quot; facs=&quot;00000003.tif&quot;/> occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. … </ab> </div1> </body> </text> </TEI> P5 Level 2 Encoding Example
  • 27.
    Level 3 Encoding: Simple Analysis To create text that can stand alone as electronic text Identifies hierarchy (logical structure) and typography without content analysis being of primary importance Features to be encoded are determined by the logical structure and appearance of the text can stand alone as text without page images Most suitable for: Some sophistication of display, delivery, and searching based on structure of the text is desired Texts will be checked to ensure that encoding decisions have been made appropriately material is of interest to a large community of users who wish to read texts that allow keyword searching
  • 28.
    Level 3 Encoding:Characteristics All elements specified in Levels 1 and 2 plus the following : <front>, <back> Required if present <div> Required if present; type attribute is recommended <floatingText> Recommended if present. <p> Required for paragraph breaks in prose. <lg> and <l> Required for identifying groups of lines and lines, respectively <list> and <item> May be used in this level to indicate ordered and unordered list structures <table>, <row>, and <cell> May be used to indicate table structures. <figure> Required to indicate figures other than page images <hi> Required to indicate changes in typeface; rend attribute is optional <note> All notes must be encoded. It is also recommended that notes that extend beyond one page be combined into one <note> element. Marginal notes, without reference, should occur at the beginning of the paragraph to which they refer, with the value of the place attribute as &quot;margin&quot;
  • 29.
    Level 3 Encoding: General Recommendations Front matter <div type=&quot;contents&quot;> : Use lists to mark up the table of contents with the <ptr> tag used to reference the starting page number. The <ptr> tag can reference the <pb> identifier OR an identifier (e.g., @xml:id) placed in the corresponding division of text. Body <note> Inline. The note is inserted at the point of reference. An n attribute records the value of the note reference if there is one Back <div type=&quot;index&quot;> : Use lists to mark up index entries with the <ref> tag used to reference the corresponding page number. Add the &quot;target&quot; attribute (@target) to reference the <pb> identifier to generate links from the index into the text proper Running heads, catch words, and other such forme work information should NOT be included in Level 3, with the exception of page numbers, which are recorded using pb
  • 30.
    Level 3 Encoding: Prose Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;> <teiHeader> [stuff] </teiHeader> <text> <front> <div type=&quot;frontispiece&quot;>[figure]</div1> <titlePage>[text]</titlePage> <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front> <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;chapter“> <pb n=“5” xml:id=“freear-p03” />[text] </div2> <div type=&quot;chapter&quot;> <pb n=“12” xml:id=“freear-p12” />[text] </div2> <div type=&quot;chapter&quot;>[text]</div2> </div> </body> <back> <div type=&quot;appendix&quot;>[text]</div1> <div type=&quot;index&quot;>[text]</div1> </back> </text></TEI> Table of Contents: <!--@target references page break identifier--> <div type=&quot;contents&quot;> <head>CONTENTS</head> <list type=&quot;simple&quot;> <item>I. A Boy and His Dog <hi rend=&quot;right&quot;>3</hi> <ptr target=&quot;#freear-p03&quot;/> </item> <item>II. Romance <hi rend=&quot;right&quot;>12</hi> <ptr target=&quot;#freear-p12&quot;/> </item> </div>
  • 31.
    Level 3 Encoding: Verse Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;> <teiHeader> [stuff] </teiHeader> <text> <front> <titlePage>[text]</titlePage> <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front> <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;part&quot;> <head>[section title]</head> <div type=&quot;poem&quot;> <head>THE DAYS GONE BY.</head> <lg> <l n=&quot;1&quot;>O the days gone by! O the days gone by!</l> <l n=&quot;2&quot;>The apples in the orchard, and the pathway through the rye;</l> <l n=&quot;3&quot;>The chirrup of the robin, and the whistle of the quail</l> <l n=&quot;4&quot;>As he piped across the meadows sweet as any nightingale;</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </div> </div> </div> </body> </text> </TEI>
  • 32.
    Level 4 Encoding: Basic Content Analysis To create text that can stand alone as electronic text identifies hierarchy and typography specifies function of textual and structural elements describes the nature of the content and not merely its appearance. Features of the text that may contribute to meaning, such as indentation of verse lines and typographic change, are preserved Most suitable for: sophisticated search and retrieval capabilities are desired texts will be used for textual analysis users of the texts may have limited storage or display capabilities
  • 33.
    Level 4 Encoding:Characteristics All elements specified in Levels 1, 2 and 3 plus the following : Et cetera; see TEI BPG Guidelines <titlePage> and child elements Required if present <group> Required to encode a collection of independent texts that are regarded as a single group for processing or other purposes <emph>, <foreign>, <gloss>, <term>, or <title> Recommended to identify typographically distinct text <epigraph>, <quote>, <said>, <mentioned>, or <soCalled> Recommended to represent speech, thought, quotation, etc. <sic>, <corr>, or <choice> Recommended to encode errors or typos. <add>, <del>, <gap>, and <unclear> Recommended to encode material that is omitted, added, marked for deletion, or is illegible, invisible, or inaudible <opener>, <dateline>, <salute> <closer>, <signed>, <postscript> Required to indicate specific parts of letters <sp>, <speaker>, and <stage> Required to encode different dramatic structures. <sp> and <speaker> Required to encode oral histories interviews
  • 34.
    <p> But itis well authenticated by the observation of every one, that <del rend=&quot;overstrike&quot; hand=&quot;JHL&quot;> their manner </del> <add rend=&quot;sup&quot; hand=&quot;JHL&quot;> this way—i.e. the above </add> of writing influences the style of compos. of those who practise it considerably, when they grow up to years of manhood; for their productions, <del hand=&quot;JHL&quot; rend=&quot;overstrike&quot;> instead </del> far from being terse, argumentative, convincing, are without head or tail & are generally an incongruous mass mixed up in the most disgusting manner, without divisions or heads & in short without a subject (so to speak). </p> Example of Level 4 Encoding
  • 35.
    Level 5 Encoding: Scholarly Encoding Projects Level 5 texts are those that require subject knowledge, and encode semantic, linguistic, prosodic, or other elements beyond a basic structural level
  • 36.
    <l>So hath myn<app> <lem wit=“#msB #msC”>herte</lem> <rdg wit=“#msA”>hert</rdg> <rdg wit=“#msD”>minde</rdg> <rdg wit=“#msE>mynde</rdg> </app> Caught in remembraunce</l> Example: Variant Readings in Level 5 Apparatus; critical apparatus Lemma, or base text
  • 37.
    General Recommendations Anencoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work Certain features of the text, such as publisher's advertisements or indexes, should be included as links to page images Any omissions of material found in the original work should be noted in the <editorialDecl> in the TEI header An encoding project should use only numbered divisions (i.e., <div1>, <div2>, etc.) or unnumbered divisions (i.e., <div>) but not both Whether numbered or unnumbered divisions are used, the @type attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5 Page breaks should be encoded using the <pb> element, which should demark the top of a page (i.e. the text of page seven should immediately follow <pb n=&quot;7&quot;/>), and should always be contained within a div for ease of retrieval with indexing software
  • 38.
    PART 4: ShortPractice in Text Encoding
  • 39.
    Author: JamesWallen Title: Cleveland’s Golden Story Publishing Place and Publisher : [Cleveland, OH]: Wm. Taylor Son & Co. Year : 1920 93 pp. CONTENTS TEXT ETC . Chapter 1. The Kingdom of God. 1 Chapter 2. Lincoln-Hearted Men 9 Chapter 3. Taming the Wilderness 19
  • 40.
    Chapter 1: TheKingdom of Gold Gold is the symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.” His catalog of iritations included: 1. The weather 2. The peacocks 3. His meagre grasp of Hamlet, Prince of Denmark Chapter Heading and Paragraph
  • 41.
    <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?><TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader type=&quot;text&quot;> [stuff goes here] </teiHeader> <text> <front> <list> <item> Chapter 1. The Kingdom of God. 1 </item> <item> Chapter 2. Lincoln-Hearted Men. 9 </item> [ETC.] </list> </front> <body> <div type=“section&quot;> <pb n=“1&quot; facs=&quot;p1.jpg&quot;/> <head> The Kingdom of God </head> <ab> [a whole section is contained within this anonymous block tag; interspersed with <pb> elements pointing to page images] <pb xml:id=&quot;p21198-zz0002mpwb&quot; n=&quot;2&quot;/> </ab> </div> </body> <back> [optional] </back> </text> </TEI> P5 Level 2 Encoding
  • 42.
    <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?><TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;> <teiHeader type=&quot;text&quot;>[stuff goes here]</teiHeader> <text> <front> <pb n=“1” xml:id=“walcle01-00” /> <div type=“contents”> <list> <item>Chapter 1. The Kingdom of God. <hi rend=&quot;right&quot;> 1 </hi> <ptr target=“#walcle01-p1”/> </item> <item>Chapter 2. Lincoln-Hearted Men. <hi rend=&quot;right&quot;> 9 </hi><ptr target=“# walcle01-p1”/> </item> [ETC.] </list> </div> </front> <body> <div type=“ chapter &quot;> <pb n=“1&quot; xml:id=&quot;walcle01-p1&quot;/> <head type=“main”> Chapter 1 </head> <head type=“subtitle”> The Kingdom of God </head> <p> [FIRST PARAGRAPH GOES HERE] </p> </div> </body> <back> [optional] </back> </text></TEI> P5 Level 3 Encoding
  • 43.
    <p> Gold isthe symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.” </p> <p> His catalog of iritations included: <list> <item> 1. The weather </item> <item> 2. The peacocks </item> <item> 3. His meagre grasp of <hi> Hamlet, Prince of Denmark </hi> </item> </list> </p> P5 Level 3 Continued
  • 44.
    <p>Gold is thesymbol of adventure—the unresting urge that stirs men’s souls. <name type=“person” key=“FDO1”> Francois de Orlenna </name> , who crossed the South American continent from ocean to ocean in <date when=“1540”> 1540 </date> , wrote, <q> Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold. </q> </p> <p>His catalog of <sic> iritations <sic><corr> irritations </corr> included: <list> <item> 1. The weather</item> <item> 2. The peacocks </item> <item> 3. His meagre grasp of <hi> <bibl><title ref=“hamlet1”> Hamlet, Prince of Denmark </title></bibl> </hi> </item> </list> </p> … <bibStruct xml:id=“hamlet1”> <monogr> <author> Shakespeare, William </author> <title> Hamlet, Prince of Denmark </title> <date> </monogr> </bibStruct> P5 Level 4 Encoding
  • 45.
  • 46.
    Provides administrative, descriptive,and preservation metadata Administrative : who created the metadata? When was it created? Where is the original item located? Etc. Descriptive : title, author, publication info, subject headings, number of pages, etc. Preservation : file size, identifier, format, etc. TEI Header
  • 47.
    Electronic Version InformationInformation about the ELECTRONIC version of the work(s) Electronic Distributor Information Information about the publisher of the ELECTRONIC version of the work(s) E.g. William Taylor & Co. published the original work, but Kelvin Smith Library is publishing the electronic version Original Document Bibliographic Information Bibliographic information of the text from which the electronic version was derived. May be generated from MARC record (but does not have to be). Encoding Description Includes project description, encoding level declaration, what classification structure is used (e.g. LCSH), etc. Profile Description Includes text language, subject terms Revision Description If any revision was done to the TEI document, this is where that information is recorded, included revision details, party(ies) involved, and date(s) Basic Components of TEI Header
  • 48.
    Can reflect atext center’s standards, serve as the basis for other types of metadata system records, Can function in detached form as records in a catalog, as a title page inherent to the document, or as a source for index displays May describe a collection of documents, a single item, or a portion of an item A TEI header may NOT necessarily have a one to one correspondence with a MARC record. One TEI header may have multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers May contain an historical background on how the file has been treated and extend the information of a classic catalog record There is no ONE header template. Modification needed depending on project, text type. TEI Header (continued)
  • 49.
    Example: MARC toTEI Header LEADER 00000nam 2200000Ia 4500 001 49237829 003 OCoLC 005 20020305071435.0 008 020305s1905 ohu r 000 0 eng d 040 CWR|cCWR 049 CWRR 090 BJ1161|b.G6 1905a 100 1 Given, Charles Stewart 245 12 A fleece of gold :|bfive lessons from the fable of Jason and the Golden Fleece /|cby Charles Stewart Given 260 Cincinnati [Ohio] :|bJennings and Graham ;|aNew York [N.Y.] :|bEaton and Mains,|cc1905 300 103 p. ;|c18 cm 533 Photocopy.|bLaCrosse, Wis. :|cBrookhaven Press : digital production by Northern Micrographics, Inc.,|d2001.|e18 cm 650 0 Success 650 0 Conduct of life 650 0 Jason (Greek mythology) <sourceDesc> <biblStruct> <titleStmt> <title type=&quot;main&quot;>A fleece of gold</title> <title type=&quot;sub&quot;>five lessons from the fable of Jason and the Golden Fleece</title> <!-- subheading [if applicable] --> <author> <persName>Given, Charles Stewart <persName> </author> </titleStmt> <extent>103 p.</extent> <publicationStmt> <!-- groups information concerning publisher, place of publication, and date of the text --> <pubPlace>Cincinnati [Ohio]</pubPlace> <publisher>Jennings and Graham</publisher> <date>1905</date> <idno>BJ1161 .G6 1905a</idno> </publicationStmt> </biblStruct> </sourceDesc> … <profileDesc> <keywords scheme=&quot;LCSH&quot;> <!-- if the keywords come from a controlled vocabulary, it can be identified by the scheme attribute --> <term>Success</term> <term>Conduct of life</term> <term>Jason (Greek mythology)</term> </keywords> </textClass> </profileDesc>
  • 50.
    Session 2: TextEncoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
  • 51.
    PART 6: BasicText Type Structures
  • 52.
    Before encoding atext, skim through it and mark out its structure. For books, take note of: Volumes Parts Chapters Section breaks Table of contents End matter (indices, endnotes, bibliography, glossary, appendices, colophon, etc.) Front matter (title page, dedication, epigraph, preface, introduction, etc.) Introduction
  • 53.
    Sample Book StructureThis book has: A. An Introduction, B. Two chapters – each with a heading and two sections – and C. an Index
  • 54.
  • 55.
    Basic Document Structure<text> <front> <div type=&quot;preface&quot;> <!-- ... --> </div> <div type=&quot;introduction&quot;> <!-- ... --> </div> </front> <body> <div type=“chapter” n=“1”> <heading type =“main”>Chapter 1</heading> <heading type=“subtitle”>Wines</heading> <div type=“section”>White wines ... </section> <div type=“section”>Red wines ... </section> </div> </body> <back> <div type=&quot;index&quot;> <!-- … --> </div> </back> </text> Un-numbered Division
  • 56.
    Numbered Divisions withinBody <body> <div1 type=&quot;part&quot; n=&quot;1“> <div2 type=&quot;chapter&quot; n=&quot;1“> <!-- text of part 1, chapter 1 --> </div2> </div1> <div1 type=&quot;part&quot; n=&quot;2“> <div2 type=&quot;chapter&quot; n=&quot;2“> <!-- text of part 2, chapter 2 --> </div2> </div1> </body> The largest possible subdivision of the body is <div1> and the smallest possible is <div7>. You CANNOT mix unnumbered and numbered divisions (i.e. <div> with <div1> etc.) Regardless, a good practice is to use n= and/or xml:id= attributes. For example: <div type=“chapter” n=“1”>  NOTE: n= is optional, and repeatable
  • 57.
    <text> <front><!-- ...--></front> <group> <text><!-- ... --></text> <text><!-- ... --></text> <text><!-- ... --></text> </group> <back><!-- ... --></back> </text> Using <group> for Editions
  • 58.
    <div1 n=“I” type=“Act”><head> Act I </head> <div2 type=&quot;scene&quot;> <head> Scene 1 </head> <stage type=&quot;entrance&quot;> Enter Fay </stage> <sp> <speaker> Fay </speaker> <p> I say, Dinah, has anyone seen my gloves? </p> </sp> <stage type=&quot;entrance&quot;> Enter Dinah </stage> <sp> <speaker> Dinah </speaker> <p> No, miss, perhaps the parakeet has got them again? </p> </sp> <stage type=&quot;exit&quot;> Exit Fay and Dinah </stage> </div2> </div1> Drama
  • 59.
    <div type=&quot;letter&quot;> <opener><dateline> <date when=&quot;1865-08-05&quot;> August the 5th </date> <name type=&quot;place&quot;> Cape Cod </name> </dateline> <salute> My dear <name type=&quot;person&quot;> Becky </name></salute> </opener> <p> How lovely the oysters are this evening! </p> <closer> <salute> Yours very truly </salute> <signed><name type=&quot;person&quot;> Maria </name></signed> </closer> </div> Letters
  • 60.
    <lg type=&quot;sonnet&quot;> <head>OnFirst Looking into Chapman’s Homer</head> <lg type=&quot;quatrain&quot;> <l>Much have I travell’d in the realms of gold,</l> <l>And many goodly states and kingdoms seen;</l> <l>Round many western islands have I been</l> <l>Which bards in fealty to <persName>Apollo</persName> hold.</l> </lg> <lg type=&quot;quatrain&quot;> <l>Oft of one wide expanse had I been told</l> <l>That deep-brow’d <persName>Homer</persName> ruled as his demesne;</l> <l>Yet did I never breathe its pure serene</l> <l>Till I heard <persName>Chapman</persName> speak out loud and bold:</l> </lg> <lg type=&quot;sestet&quot;> <l>Then felt I like some watcher of the skies</l> <l>When a new planet swims into his ken;</l> <l>Or like stout <persName>Cortez</persName> when with eagle eyes</l> <l>He star’d at the <placeName>Pacific</placeName>—and all his men</l> <l>Look’d at each other with a wild surmise&mdash;</l> <l>Silent, upon a peak in <placeName>Darien</placeName>.</l> </lg> </lg> Verse <persName> and <placeName> are optional for Level 3; Used in Level 4.
  • 61.
    Warp Speed, Ms.Bright! There was a young lady named Bright, Who travelled much faster than light, She departed one day, In a relative way way, And returned on the previous night. Verse Example
  • 62.
    <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?><lg type=&quot;limerick&quot; rhyme=&quot;aabba&quot; n=&quot;3&quot;> <head>Warp Speed, Ms Bright!</head> <l>There was a young lady named <rhyme label=&quot;a&quot;>Bright</rhyme>,</l> <l>Who travelled much faster than <rhyme label=&quot;a&quot;>light</rhyme>,</l> <l>She departed one <rhyme label=&quot;b&quot;>day</rhyme>,</l> <l>In a relative <rhyme label=&quot;b&quot;>way</rhyme>,</l> <l>And returned on the previous <rhyme label=&quot;a&quot;>night</rhyme>.</l> </lg> TEI P5 Level 5 Rendering
  • 63.
    PART 7: SomeCommon Practices in Text Encoding
  • 64.
    Make sure youare linked to a schema <?oxygen RNGSchema=&quot; http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng &quot; type=“xml&quot;?> 2. Include as much of the header elements as possible – including <editorialDecl> <editorialDecl> <hyphenation eol=&quot;none&quot;> <p>Hyphenated words that appear at the end of a line have been removed</p> </hyphenation> </editorialDecl> 3. Make use of spell-check: F4 key 4. Delete hyphens within all words – except in special cases (e.g. poetry, dramatic verse) Institu-tion becomes Institution Using oXygen
  • 65.
    5. Include <respStmt> in header for any emendations or corrections you later make to a text that has been previously encoded <respStmt> <name xml:id=&quot;rlw54&quot;>Richard Wisneski</name> <!-- one resp per respStmt --> <resp>TEI Header creator</resp> <!-- OR TEI Header and document creator --> </respStmt> 6. Page breaks must be inserted, using <pb n=“[page number]” xml:id=“” /> <pb xml:id=&quot;fleboo-032&quot; n=&quot;33&quot;/> OR, if one desires to have page reference the specific page image: <pb facs=“fleboo-032.jp2“ n=&quot;33&quot;/> xml:id attribute MUST be (a) unique and (b) start with a letter facs attribute MAY link to a permanent URL or URI Common Practices (continued)
  • 66.
    As part ofa page: <figure> <figDesc>An illuminated page from the de Brailes Hours, containing a historiated initial with a signed self-portrait of William de Brailes </figDesc> <graphic url=&quot;./gfx/debrailes_ms.jpg&quot; height=&quot;600px&quot;/> </figure> Inserting Images <figDesc> is not required in Level 3, but we are using it to capture either an image caption or to describe the image if a caption is not present
  • 67.
  • 68.
    <p>FIRST ANNUAL REPORT<noteplace=&quot;foot&quot; rend=&quot;*&quot;>This was the first report made after the schools were regularly organized under the ordinance. The Bethel School mentioned in the opening paragraph had existed through a part of the previous year, the year 1836, and a Board appointed to look after its interests, had made an informal— probably oral — report.</note> OF THE- BOARD OF MANAGERS OF COMMON SCHOOLS.</p> Footnote Encoded (and Marginalia) If this note were in the MARGIN of the page, it would be encoded, for example: <note type=“auth” place=“margin-left”> text, text,text </note> Type= and rend= attributes are optional
  • 69.
    In the bodyof the text: <p>He liked to eat pie<ptr target=&quot;#note1&quot; rend=&quot;1&quot;/>.</p> OR… <p>See <ref target=&quot;#note1&quot;>Note 30</ref></p> At the end of the text: <div type=&quot;endnotes&quot;> <pb n=&quot;130&quot; /> <p xml:id=“note1&quot;>Pie is a dessert often eaten.</p> </div> Endnotes
  • 70.
  • 71.
    <l>(Diff'rent our parties,but with equal grace</l> <l>The Goddess smiles on Whig and Tory race, <ptr rend=&quot;unmarked&quot; target=&quot;#note3.284&quot;/> </l> <l>'Tis the same rope at sev'ral ends they twist,</l> <l>To Dulness, Ridpath is as dear as Mist)</l> <note xml:id=&quot;note3.284&quot; type=&quot;imitation&quot; place=&quot;foot&quot; anchored=&quot;false&quot;> <bibl>Virg. Æn. 10.</bibl> <quote> <l>Tros Rutulusve fuat; nullo discrimine habebo.</l> <l>—— Rex Jupiter omnibus idem.</l> </quote> </note> Encode with Pointer and Link
  • 72.
    PART 8: EncodingReferences
  • 73.
    Definition : Thingswe know about the content of the text that we want to be able to state explicitly to add value to the text or assist the reader in understanding it better, such as: Authority control: information about the identity of things named in the text: people, places, books, etc. Additional information about: birthdates, geographical locations, date published, etc. Interpretive information: themes, keywords Normalization of measurements, dates, etc. Encoding Contextual Information
  • 74.
    Names <persName>Baron Olivierof Brighton</persName> <placeName>New York</placeName> <orgName>Podunk Sewing Club</orgName> Linguistic: <foreign>, <distinct>, <soCalled>, <mentioned>, <term>, <emph> <distinct>dinna ken</distinct> why that <foreign xml:lang=&quot;fr&quot;>soi-disant</foreign> <soCalled>expert</soCalled> must be <emph>so</emph> particular about pronouncing <mentioned xml:lang=&quot;cy&quot;>Llandaff</mentioned> using a <term>voiceless lateral fricative</term> Common Tags for Contextual Information
  • 75.
    ’ Ographies: prosopography(personography) gazetteers (placeography) orgography, bibliography These are like local authority lists that you create Keywords applied to the text as a whole Thematic or interpretive information applied to specific places in the text Types of Contextual Information
  • 76.
    Like a localname authority file Can be simple or very detailed Can be kept in your encoded file or externally Includes specific elements for the most common data Also includes general elements for the unforeseen Personography
  • 77.
    <teiHeader> <!-- ...--> <particDesc> <listPerson> <person xml:id=&quot;andrew_j_steere&quot;> <persName>Steere, Andrew J.</persName> <birth when=&quot;1844&quot;> <placeName ref=&quot;#l_scituate&quot;>Scituate, RI</placeName> </birth> <death notBefore=&quot;1918&quot;/> </person> <person xml:id=&quot;george_pope_morris&quot;> <persName>Morris, George Pope</persName> <birth when=&quot;1802&quot;> <placeName>Philadelphia, PA</placeName> </birth> <death when=&quot;1864&quot;/> </person> </listPerson> </particDesc> </teiHeader> <text> <p>...However, the plea of Woodman spare that tree and the patriotic pride of the owner, <persName ref=&quot;#andrew_j_steere&quot;>Mr. Andrew J. Steere</persName>, had guaranteed its safety from the woodsman’s axe. </p> </text> Personography Encoding TEI header Participation description listPerson person
  • 78.
    Very similar topersonography...but for places Can be linked to maps via geographic information data Placeography (Gazetteer)
  • 79.
    <body> <p>The treestood about a mile east of <placeName ref=&quot;#l_chepachet&quot;>Chepachet</placeName> and a mile north of <placeName ref=&quot;#l_spring_grove&quot;>Spring Grove</placeName> ... </p> <!-- ... --> </body> <back> <div type=&quot;editorial&quot;> <listPlace> <place type=&quot;state&quot; xml:id=&quot;l_rhode_island&quot;> <placeName>The State of Rhode Island and Providence Plantations</placeName> <country>United States of America</country> <region>New England</region> </place> <place type=&quot;settlement&quot; xml:id=&quot;l_chepachet&quot;> <placeName>Chepachet</placeName> <region ref=&quot;#l_rhode_island&quot;/> <location> <geo>41.915131 -71.671397</geo> </location> </place> <place type=&quot;settlement&quot; xml:id=&quot;l_spring_grove&quot;> <placeName>Spring Grove</placeName> <region ref=&quot;#l_rhode_island&quot;/> <location> <geo>41.905583 -71.656219</geo> </location> </place> </listPlace> </div> </back> Placeography Encoding back div place
  • 80.
    To associate akeyword or interpretive concept with a word, phrase, or passage of text: <body> <div type=&quot;section&quot;> <p>However, the plea of <quote>Woodman spare that tree</quote> and the <seg ana=&quot;#patriotism&quot;>patriotic pride of the owner</seg>, <persName>Mr. Andrew J. Steere</persName>, had <seg ana=&quot;#conservation&quot;>guaranteed its safety from the woodsman’s axe</seg>...</p> </div> </body> <back> <div type=&quot;editorial&quot;> <interpGrp> <interp xml:id=&quot;ri_history&quot;>Rhode Island local history</interp> <interp xml:id=&quot;patriotism&quot;>Patriotism and references to the war effort</interp> <interp xml:id=&quot;commercial&quot;>References to commercial harvesting and use of trees</interp> <interp xml:id=&quot;conservation&quot;>Conservation efforts and protection of trees</interp> <interp xml:id=&quot;arboriculture&quot;>References to tree species and their cultivation</interp> </interpGrp> </div> </back> Interpretative Keywords and Themes
  • 81.
    PART 9: SomeComplex Situations in Text Encoding
  • 82.
    Overlapping occurs especiallywith older texts. XML elements may not overlap, but document structures often do. Examples include: Physical features like pages, columns, and lines, and textual things like paragraphs or names Verse lines and quotations, names, other phrasal elements Verse lines and linguistic features Dramatic speeches and verse lines Handwritten additions or deletions and other structures Typographical features and linguistic features Overlapping
  • 83.
    Mortal, she said,&quot;I'm sent to you, Then hold my precepts fast; Remember earth’s best joys are few, And can’t for ever last.&quot; Example
  • 84.
    <lg type=&quot;stanza&quot;> <l>Mortal,she said, <said xml:id=&quot;s01&quot; next=&quot;#s02&quot;>I'm sent to you,</said></l> <l><said xml:id=&quot;s02&quot; next=&quot;#s03&quot; prev=&quot;#s01&quot;>Then hold my precepts fast;</said></l> <l><said xml:id=&quot;s03&quot; prev=&quot;#s02&quot; next=&quot;#s04&quot;>Remember earth’s best joys are few,</said></l> <l><said xml:id=&quot;s04&quot; prev=&quot;#s03&quot;>And can’t for ever last.</said></l> </lg> One Possible Solution
  • 85.
    Transcriptional Complexities <p>Johnstonetc 1764 Mr Nikl<unclear>e</unclear> <supplied>s</supplied><gap reason=&quot;folded&quot; extent=&quot;unknown&quot;/> Brown <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear> to ten Rum Barels at Four pound &per; Barel — — — £40</p>
  • 86.
    Marks a boundarypoint separating any kind of section of a text where the change is not represented by a structural element Regarding page breaks, signatures, line breaks, and column breaks Example: <pb n=&quot;249&quot;/> <milestone unit=&quot;sig&quot; n=&quot;R5r&quot;/> <lb/>digested. Its long trunk, as seen slanting down from <lb/>out of the building across the wharf and into the ship, … <lb/>will have been—a farthing.</p> <pb n=&quot;250&quot;/> <milestone unit=&quot;sig&quot; n=&quot;R5v&quot;/> Milestones
  • 87.
    Available on everyelement when additional tagset for segmentation & alignment is used To count lines, or indicate metrical patterns, etc. --> divide into parts <sp> <speaker>Leo.</speaker> <l part=&quot;F&quot;>Go on, go on:</l> <l>Thou canst not speake too much, I have deserv’d</l> <l part=&quot;I&quot;>All tongues to talk their bittrest.</l> </sp> <sp> <speaker>Lord.</speaker> <l part=&quot;F&quot;>Say no more;</l> <l>How ere the business goes, you have made fault</l> <l part=&quot;I&quot;>I’th boldnesse of your speech.</l> </sp> <sp> <speaker>Pauline.</speaker> <l part=&quot;F&quot;>I am sorry for’t;</l> <l>All faults I make, when I shall come to know them</l> <!-- ... --> </sp> Fragmentation
  • 88.
  • 89.
    Encoding Parallel Structures<lg type=&quot;stanza&quot; xml:lang=&quot;fr&quot;> <l xml:id=&quot;fr2.01&quot; corresp=&quot;#en2.01&quot;>Nos péchés sont têtus, nos repentirs sont lâches;</l> <l xml:id=&quot;fr2.02&quot; corresp=&quot;#en2.02&quot;>Nous nous faisons payer grassement nos aveux,</l> <l xml:id=&quot;fr2.03&quot; corresp=&quot;#en2.04&quot;>Et nous rentrons gaiement dans le chemin bourbeux,</l> <l xml:id=&quot;fr2.04&quot; corresp=&quot;#en2.03&quot;>Croyant par de vils pleurs laver toutes nos taches.</l> </lg> <lg type=&quot;stanza&quot; xml:lang=&quot;en&quot;> <l xml:id=&quot;en2.01&quot; corresp=&quot;#fr2.01&quot;>Our sins are stubborn, craven our repentance.</l> <l xml:id=&quot;en2.02&quot; corresp=&quot;#fr2.02&quot;>For our weak vows we ask excessive prices.</l> <l xml:id=&quot;en2.03&quot; corresp=&quot;#fr2.04&quot;>Trusting our tears will wash away the sentence,</l> <l xml:id=&quot;en2.04&quot; corresp=&quot;#fr2.03&quot;>We sneak off where the muddy road entices.</l> </lg>
  • 90.
    <p>...with them, bycausethey woulde <lb/>not be <choice> <abbr>boūde</abbr> <expan>bounde</expan> </choice> also for an other wo[see below] <lb/>mā at theyr pleasure, whom they <lb/>knewe not, nor yet what matter <lb/>was layed unto her charge. Not <lb/>wythstandynge at the laste, after <lb/>moche a do and reasonyng to and <lb/>fro, they toke a bonde of them of <lb/>recognisaunce for my fourth com <lb/>mynge. And thus I was at the <lb/>last, <choice> <orig>delyuered</orig> <reg>delyvered</reg> </choice>. Written by me An <lb/>ne Askewe. </p> Textual Splitting CHOICE
  • 91.
    Textual Splitting (continued)… <choice> <abbr> <choice> <sic>wo<lb/>mā</sic> <corr>wo-<lb/>mā</corr> </choice> </abbr> <expan> <choice> <sic>wo<lb/>man</sic> <corr>wo-<lb/>man</corr> </choice> </expan> </choice> also for an other wo[see below] <lb/>mā at theyr pleasure, whom they You can put CHOICE inside name, or outside (nesting)
  • 92.
    For an unclearreading, you can put multiple clear elements within choice Use <unclear>, <supplied>, and <gap> element for what you can't read from the written manuscript Example: Brown <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear> Unclear Texts
  • 93.
    <teiHeader> <fileDesc> <!--... --> <notesStmt> <note>The historiated capital on folio 43r contains a signed self-portrait of William de Brailes. The text is richly illuminated ...</note> </notesStmt> <!-- ... --> </fileDesc> </teiHeader> Descriptive Prose for Images
  • 94.
    Formalization for Images<physDesc> <objectDesc form=&quot;codex&quot;> <supportDesc material=&quot;vellum&quot;> <extent> <dimensions> <height quantity=&quot;150&quot; unit=&quot;mm&quot;/> <width quantity=&quot;124&quot; unit=&quot;mm&quot;/> </dimensions> </extent> </supportDesc> <layoutDesc> <layout columns=&quot;1&quot; ruledLines=&quot;12&quot;/> </layoutDesc> </objectDesc> <handDesc> <handNote scribe=&quot;william_debrailes&quot; script=&quot;carolingian&quot; medium=&quot;ink&quot; scope=&quot;sole&quot;/> </handDesc> </physDesc> practice
  • 95.
    PART 10: TEIGuidelines
  • 96.
    The outward expressionof TEI to the user is the TEI guidelines: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html These guidelines include schemas – formal rules for encoding documents – and documentation to explain how to apply these rules Customization is possible to apply TEI markup as strict or as loose as one wishes TEI Guidelines = Your instruction manual TEI Guidelines
  • 97.
    Topically –into chapters and modules Semantically and functionally – into classes Pedagogically – into discussion and reference TEI provides the tools. You have to ask: What is my project? What are my needs? What level of granularity do I want? TEI Guidelines Divisions
  • 98.
    Some modules (e.g.TEI header) are required – core elements Some modules provide genre- or discipline-specific elements : e.g., the verse module contains <rhyme> and <caesura> Some modules provide functionality : e.g., the linking module contains markup for encoding arbitrary passages and noting links between them Chapters and Modules
  • 99.
    Classes are functionalor semantic groups of elements. Two kinds: model class: elements that appear in the same place in the logical structure e.g., model.milestoneLike contains those milestone elements used to describe the physical and typographic structure of a codex: pb, cb, lb, and milestone attribute class: elements that share a common attribute e.g., att.ascribed is the class of elements that share the who attribute, i.e. can be attributed to an individual, including change, q, and sp Classes
  • 100.
    Classes are composedof elements and sometimes other classes: elements and attribute classes can be members of attribute classes a given element (or attribute class) may be a member of more than one attribute class elements, model classes, or attribute classes can be members of model classes a given element (or class) may be a member of more than one model class Class Constituency
  • 101.
    Datatypes are constraintson (attribute) values. May limit values to things like: a supplied list of possible values: e.g. sex of person a user-provided list of possible values: e.g. type of div strings that conform to a specific format: e.g. when of date, extent of gap, or xml:lang Datatypes
  • 102.
    contain a prosediscussion of: Encoding topics, A reference section, and Schemas that constrain encoding One Document Does It All (ODD) TEI Guidelines…
  • 103.
  • 104.
    TEI Under thehood http://www.tei-c.org/Roma/
  • 105.
    Roma , theweb interface to the TEI customization mechanism, performs two functions: Provides a user interface for editing TEI ODD files Provides a user interface for generating TEI schemas and documentation ROMA, The Web Tool
  • 106.
    Select which modulesto use and not to use Choose what elements in those modules to use and not to use Choose what attributes on those elements to use and not to use Change element or attribute names Restrict the values of attributes – IMPORTANT Constrain structure Add new elements Add new attributes Produce an internationalized version of the TEI Customization Options
  • 107.
    Customize: Values forType= Attributes Values for any kind of information for which a controlled vocabulary exists However: Do NOT remove the header Do NOT remove required elements from the header Do NOT duplicate things which the TEI has already provided What To Customize and Not Customize
  • 108.
  • 109.
    STEP 1: Enter a (1) title, (2) filename (all one word, no characters), (3) author name, and (4) description. Leave everything else as is. STEP 2: Click “submit” STEP 3: Click the MODULES tab MODULES tab
  • 110.
    Do NOT removethe “List of selected Modules” given to you (but click link to change attributes) Click links under “Module Name” to: a. See what elements are available b. Exclude elements c. Rename elements d. Change attributes 3. To change values for attributes, click the attribute itself. 4. Click “add” on left-hand side to add elements with their attributes
  • 111.
    Clicking “add” insertsthis module here. Next, click “namedates”
  • 112.
    You can excludeelements Click to change attributes
  • 113.
    Click on theselinks to change the values for the attributes When you’re done changing elements and attributes, click here
  • 114.
    Change the valuesfor the attributes (type= ) Click if no changes are made (bypass “ submit query”) Click when finished
  • 115.
    Return to yourlist; note that if you changed any attributes, the “changes” column indicates this. Click the “Schema” tab
  • 116.
    Best is tochoose either “Relax NG schema (compact syntax) or “W3C schema” Click “Documentation” to save documentation for this schema Click “Sanity Checker” to make sure your schema validates NOTE: You can return to the Roma tool to later edit your schema
  • 117.
  • 118.
    More and betterdocumentation More use (and support for use) by individuals More discipline-specific customizations Future Trends in TEI
  • 119.
    Historical Event MarkupLanguage (HEML): http://www.heml.org/heml-cocoon/ Music Markup Language: http://www.musicmarkup.info/ Multi-Element Coding System: http://helmer.hit.uib.no/claus/mecs/mecs.htm Other Encoding Possibilities
  • 120.
    WWP Guide toScholarly Text Encoding: http://www.wwp.brown.edu/encoding/guide/index.html TEI web site: http://www.tei-c.org/index.xml The TEI listserv (TEI-L) TEI Wiki: http://www.tei-c.org/wiki/index.php/Main_Page Teach Yourself TEI: http://www.tei-c.org/Support/Learn/tutorials.xml Guidelines for Text Encoding and Interchange: http://quod.lib.umich.edu/t/tei/ A Gentle Introduction to XML: http://www.tei-c.org/release/doc/tei-p4-doc/html/SG.html A Companion to Digital Literary Studies: http://www.digitalhumanities.org/companion/DLS/ References

Editor's Notes

  • #3 Akin to learning a new language
  • #6 Show search features of each For Whitman, click “manuscripts”  clicking here (under “poetry manuscripts”)
  • #7 TEI founded in 2000. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member
  • #8 Text encoding borne out of new criticism, but more structuralist in nature. Regarding 1 st point, think of text encoding as akin to an edition of a text. Regarding the 2 nd point, there is no one right answer, but there does exist wrong answers Regarding the 3 rd point, it is expected that individual projects will remove elements, constrain attribute values, add new elements, and even import schemas from other namespaces.
  • #9 Regarding 1 st point: text encoding uses XML because it’s non-proprietary, requires no specialized software or hardware, and is meant to be long-lasting. 2 nd point: have an agreed-upon metadata and markup language that will work across collections and projects 3 rd point: these texts are not static, but rather meant to be built upon by a community of scholars
  • #12 TEI grew out of a need to create inter’l standards for textual markup in 1987. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member TEI is intended to serve an inter’l community. # Broad range of methods and approaches # Participation from member institutions around the world # Support for multilingual versions of the TEI Guidelines: Chinese, French, German, Japanese, Spanish, others in the future
  • #14 Code specifications include: Has a start and end tag No elements overlap Has a single root element (e.g. book; see upcoming slide)
  • #16 NOTES: Element names ARE case-sensitive Elements are also known as “tags” Attributes are to Elements as Adjectives are to Nouns Elements have an open and close, except for empty elements, such as &lt;pb /&gt; Elements must be properly nested
  • #18 We’ll use the Roma tool for this later on
  • #19 Not too important to understand all of this. GO TO PRACTICE
  • #21 Began in 1994. Major shift occurred in 2002 with P4 encoding LEVEL 1: Texts at Level 1 can be created and encoded by fully automated means, using uncorrected OCR of page images (&amp;quot;dirty OCR&amp;quot;), exporting from existing electronic text files, or actually not including any text at all. texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative LEVEL 2: Level 2 encoding requires some human intervention to identify each textual division and heading. Level 2 texts do not require any specialist knowledge or manual intervention below the section level. LEVEL 2 AND 1 both are not meant to have the text stand apart from the page images LEVEL 3: first attempt to have text stand alone from page images
  • #23 &lt;ab&gt; = anonymous block
  • #25 &lt;ab&gt; = anonymous block &lt;fw&gt; = forme works
  • #26 &lt;front&gt;[titlepage information, table of contents, prefaces, etc.][optional]&lt;/front&gt; &lt;ab&gt; = anonymous block, NOT &lt;p&gt; tags No &lt;p&gt; tags Facs attribute is used without METS record; xml:id attribute is used WITH METS document
  • #27 &lt;front&gt;[titlepage information, table of contents, prefaces, etc.][optional]&lt;/front&gt; &lt;ab&gt; = anonymous block, NOT &lt;p&gt; tags No &lt;p&gt; tags Not a good idea to use full file paths for facs= attribute
  • #29 This is the level KSL is using
  • #31 N.B. You can also use numbered divs. The maximum is 7. The example to the left is invalid; the &lt;div1&gt; and &lt;div2&gt; tags are there just to show that the option exists
  • #32 N= attribute for &lt;l&gt; is optional
  • #34 This is the level KSL is using
  • #35 Click the link to see the full example HAND OUT “SOME COMMON P5 TAGS”
  • #40 Ask: what do you think would need to be encoded here?
  • #41 Ask: what do you think would need to be encoded here?
  • #42 &lt;front&gt;[titlepage information, table of contents, prefaces, etc.][optional]&lt;/front&gt; &lt;ab&gt; = anonymous block, NOT &lt;p&gt; tags &lt;fw&gt; = forme works No &lt;p&gt; tags Not good practice to use file paths for facs= attribute
  • #43 &lt;pb&gt; comes after the &lt;div&gt; &lt;fw&gt; removed Xml:id is used with a METS document; facs= is used without a METS document
  • #44 &lt;hi rend=“italics”&gt; the rend attribute is optional
  • #45 &lt;bibStruct&gt; can be in the TEI header or in a separate TEI file, referenced in this TEI document (makes more sense to do the latter). Take note of &lt;q&gt; (can be missed in this example). GO TO PRACTICE
  • #47 In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
  • #48 In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
  • #49 Distribute spreadsheet
  • #50 Show how I got to the MARC display Be aware that other components may have to go into the header, depending on your project (e.g. working with verse). Also requires appropriate schema elements and attributes. GO TO PRACTICE TO CREATE A TEI HEADER
  • #53 Distribute spreadsheet
  • #56 Repeat div=chapter for each chapter. Hand out “P5 General Recommendations” from spreadsheet TEI provides the tools. You have to ask: what is my project? What are my needs? What level of granularity do I want?
  • #57 Choice on what to do depends on the complexity of the material with which you are working Attribute n= is optional, and repeatable
  • #58 For editions within editions PUT EXAMPLE SLIDE IN HERE
  • #61 &lt;persName&gt; and &lt;placeName&gt; are optional (for Level 4)
  • #63 DO PRACTICE USING A POEM. Check for well-formedness, NOT validity.
  • #66 We are using xml:id N= is the page number on page
  • #69 GO TO REFERENCE PRACTICE
  • #83 There are at least a dozen solutions to overlapping
  • #87 Graphic and pb tags are EMPTY elements
  • #89 Graphic and pb tags are EMPTY elements
  • #90 Graphic and pb tags are EMPTY elements
  • #91 &lt;choice&gt; &lt;abbr&gt; &lt;choice&gt; &lt;sic&gt;wo&lt;lb/&gt;mā&lt;/sic&gt; &lt;corr&gt;wo-&lt;lb/&gt;mā&lt;/corr&gt; &lt;/choice&gt; &lt;/abbr&gt; &lt;expan&gt; &lt;choice&gt; &lt;sic&gt;wo&lt;lb/&gt;man&lt;/sic&gt; &lt;corr&gt;wo-&lt;lb/&gt;man&lt;/corr&gt; &lt;/choice&gt; &lt;/expan&gt; &lt;/choice&gt;
  • #92 Textual Splitting: Parallelism at a more local level Choice option. See abbreviated version or expanded version. You can encode both. You may want to show typographical errors You may want to show normalization/old style stuff (e.g. old typography)
  • #93 Graphic and pb tags are EMPTY elements
  • #94 Notice that this goes in the TEI HEADER This, too, is an informationally weak approach
  • #95 This approach is more powerful than approach #1. This approach is part of the header PRACTICE – ENCODE IMAGE
  • #97 TEI Guidelines --can be applied strictly or loosely --Can adapt to local conditions --Designed as a sett of modules that can be selected as needed --Not unlike a human language in some respect
  • #100 Club analogy: being a member of a model class is like being a member of a workgroup: you may be called upon for certain tasks; being a member of an attr class is like being a member of a club with member benefits (i.e., attributes)
  • #105 ROMA is the schema tool
  • #106 Go to the ROMA tool and spend some time with this
  • #111 delete most elements from most modules delete the key attribute from name delete most attrs from att.global and att.global.linking constrain type of div or name