• Save
Xml Case Learns 2008
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Xml Case Learns 2008

on

  • 2,571 views

PPT presentation on XML, including namespaces, DTD, and Schemas

PPT presentation on XML, including namespaces, DTD, and Schemas

Statistics

Views

Total Views
2,571
Views on SlideShare
2,524
Embed Views
47

Actions

Likes
6
Downloads
0
Comments
0

3 Embeds 47

http://www.santanu.in 40
http://www.techiegyan.com 6
http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Akin to learning a new language
  • Show search features of each For Whitman, click “manuscripts”  clicking here (under “poetry manuscripts”)
  • TEI founded in 2000. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member
  • Text encoding borne out of new criticism, but more structuralist in nature. Regarding 1 st point, think of text encoding as akin to an edition of a text. Regarding the 2 nd point, there is no one right answer, but there does exist wrong answers Regarding the 3 rd point, it is expected that individual projects will remove elements, constrain attribute values, add new elements, and even import schemas from other namespaces.
  • Regarding 1 st point: text encoding uses XML because it’s non-proprietary, requires no specialized software or hardware, and is meant to be long-lasting. 2 nd point: have an agreed-upon metadata and markup language that will work across collections and projects 3 rd point: these texts are not static, but rather meant to be built upon by a community of scholars
  • TEI grew out of a need to create inter’l standards for textual markup in 1987. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member TEI is intended to serve an inter’l community. # Broad range of methods and approaches # Participation from member institutions around the world # Support for multilingual versions of the TEI Guidelines: Chinese, French, German, Japanese, Spanish, others in the future
  • Code specifications include: Has a start and end tag No elements overlap Has a single root element (e.g. book; see upcoming slide)
  • NOTES: Element names ARE case-sensitive Elements are also known as “tags” Attributes are to Elements as Adjectives are to Nouns Elements have an open and close, except for empty elements, such as Elements must be properly nested
  • We’ll use the Roma tool for this later on
  • Not too important to understand all of this. GO TO PRACTICE
  • Began in 1994. Major shift occurred in 2002 with P4 encoding LEVEL 1: Texts at Level 1 can be created and encoded by fully automated means, using uncorrected OCR of page images ("dirty OCR"), exporting from existing electronic text files, or actually not including any text at all. texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative LEVEL 2: Level 2 encoding requires some human intervention to identify each textual division and heading. Level 2 texts do not require any specialist knowledge or manual intervention below the section level. LEVEL 2 AND 1 both are not meant to have the text stand apart from the page images LEVEL 3: first attempt to have text stand alone from page images
  • = anonymous block
  • = anonymous block = forme works
  • [titlepage information, table of contents, prefaces, etc.][optional] = anonymous block, NOT tags No tags Facs attribute is used without METS record; xml:id attribute is used WITH METS document
  • [titlepage information, table of contents, prefaces, etc.][optional] = anonymous block, NOT tags No tags Not a good idea to use full file paths for facs= attribute
  • This is the level KSL is using
  • N.B. You can also use numbered divs. The maximum is 7. The example to the left is invalid; the and tags are there just to show that the option exists
  • N= attribute for is optional
  • This is the level KSL is using
  • Click the link to see the full example HAND OUT “SOME COMMON P5 TAGS”
  • Ask: what do you think would need to be encoded here?
  • Ask: what do you think would need to be encoded here?
  • [titlepage information, table of contents, prefaces, etc.][optional] = anonymous block, NOT tags = forme works No tags Not good practice to use file paths for facs= attribute
  • comes after the removed Xml:id is used with a METS document; facs= is used without a METS document
  • the rend attribute is optional
  • can be in the TEI header or in a separate TEI file, referenced in this TEI document (makes more sense to do the latter). Take note of (can be missed in this example). GO TO PRACTICE
  • In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
  • In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
  • Distribute spreadsheet
  • Show how I got to the MARC display Be aware that other components may have to go into the header, depending on your project (e.g. working with verse). Also requires appropriate schema elements and attributes. GO TO PRACTICE TO CREATE A TEI HEADER
  • Distribute spreadsheet
  • Repeat div=chapter for each chapter. Hand out “P5 General Recommendations” from spreadsheet TEI provides the tools. You have to ask: what is my project? What are my needs? What level of granularity do I want?
  • Choice on what to do depends on the complexity of the material with which you are working Attribute n= is optional, and repeatable
  • For editions within editions PUT EXAMPLE SLIDE IN HERE
  • and are optional (for Level 4)
  • DO PRACTICE USING A POEM. Check for well-formedness, NOT validity.
  • We are using xml:id N= is the page number on page
  • GO TO REFERENCE PRACTICE
  • There are at least a dozen solutions to overlapping
  • Graphic and pb tags are EMPTY elements
  • Graphic and pb tags are EMPTY elements
  • Graphic and pb tags are EMPTY elements
  • womā wo-mā woman wo-man
  • Textual Splitting: Parallelism at a more local level Choice option. See abbreviated version or expanded version. You can encode both. You may want to show typographical errors You may want to show normalization/old style stuff (e.g. old typography)
  • Graphic and pb tags are EMPTY elements
  • Notice that this goes in the TEI HEADER This, too, is an informationally weak approach
  • This approach is more powerful than approach #1. This approach is part of the header PRACTICE – ENCODE IMAGE
  • TEI Guidelines --can be applied strictly or loosely --Can adapt to local conditions --Designed as a sett of modules that can be selected as needed --Not unlike a human language in some respect
  • Club analogy: being a member of a model class is like being a member of a workgroup: you may be called upon for certain tasks; being a member of an attr class is like being a member of a club with member benefits (i.e., attributes)
  • ROMA is the schema tool
  • Go to the ROMA tool and spend some time with this
  • delete most elements from most modules delete the key attribute from name delete most attrs from att.global and att.global.linking constrain type of div or name

Xml Case Learns 2008 Presentation Transcript

  • 1. Introduction to Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
  • 2.
    • This stuff can get difficult.
    • This stuff takes time to learn, practice, and patience
    • We can only cover so much in this session, but there are further resources to consult after this session…
    First, Some Ground Rules
  • 3.
    • P5 Guidelines, PDF link (The current “Bible” for text encoding): http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html
    • P5 Guidelines, esp. Appendix C: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html
    • Women Writers Project Guide to Scholarly Encoding: http://www.wwp.brown.edu/encoding/guide/index.html
    Sources to Consult
  • 4. PART 1: Overview of Text Encoding
  • 5.
    • Text encoding marks up a document in XML to capture metadata (administrative, descriptive, technical, preservation) AND represent textual features important for research.
    • Examples:
    • The Poetess Archive
    • Women Writers Online
    • The Dolly Madison Digital Edition
    • The Walt Whitman Archive
    What Is Text Encoding?
  • 6. Quick Example <lg> <head>After <del>an</del><add>the <del>unsolv’d</del></add> argument</head> <l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their <lb/>ways and chatter, flow in <del>upon me</del></l> <l>Like <add>welcome</add> rippling water o'er my <lb>heated <add>nerves and</add> flesh.</l> </lg>
  • 7.
    • Text encoding does NOT attempt to provide one unique, authoritative version of a work. It often pairs the document with interpretation (markup and metadata)
    • Text encoding does NOT provide one static, permanent markup for a document. While there can be alternative markup in certain instances, there can be incorrect markup
    • Text encoding (TEI) is NOT meant to have an encoding recommendation for all possibilities, but rather intends to be customized and modified within TEI guidelines
    What Text Encoding Is NOT
  • 8.
    • To allow researchers to have access to an electronic text that does not require special-purpose software or hardware
    • To analyze information – provide a standard text-encoding scheme and metadata language which accommodates searching, retrieval, etc.
    • To share information – have a standard format for data interchange in humanities research
    Why Do Text Encoding?
  • 9.
    • Tailor searching under specific genres (e.g. verse, drama, prose)
    • Search different formats (e.g. chronicle, diary)
    • Search across collections
    • Search by mode (e.g. satire, pastoral)
    • Search by historical or geographic period
    • Search by title, author, and subject headings
    • Search via structural features of text itself, including:
      • Sections
      • Headings
      • Paragraphs
      • Quotations
      • Highlighting
      • Footnotes
      • Captions
    Text Encoding Allows Users To…
  • 10.
    • Digital libraries and digital archives
    • Anthropology and social sciences
    • Literary and cultural materials
    • Scholarly editions
    • Manuscript collections and descriptions
    • Dictionaries
    • Language corpora
    • Historical documents
    • Authoring
    • Linguistics
    Who Does Text Encoding? Where Is It Found?
  • 11.
    • Technically : a standards organization for humanities text encoding
    • Organizationally : an international membership consortium
    • Socially : a community of people and projects
    • Web site: http://www.tei-c.org/
    What Is the Text Encoding Initiative (TEI)?
  • 12. PART 2: Text Encoding and XML
  • 13.
    • Texts are encoded using eXtensible Markup Language (XML)
    • XML is…
    • Easy to understand.
    • Non-proprietary plain-text:
      • Human readable
      • Software independent
      • Hardware independent
    • (relatively) easy to write a parser for.
    • Widespread: Well-supported by commercial and open-source software.
    Text Encoding and XML
  • 14. XML Documents Must Be:
    • Well-formed: Have no syntax errors and conform to XML code specifications
    • <title>Little Memoirs of the Nineteenth Century</title> <author>George Paston</author>
    • Valid: Satisfy the rules of a DTD, Schema, or RELAX NG
    • If DTD or Schema says that author name must come before the title, then content above would be rejected
  • 15. XML Vocabulary
    • Elements, Content, Attributes, Values
    • <titleStmt>
    • <title type=“m”>Little Memoirs</title>
    Element Attribute Value Content </titleStmt> Nested <titleStmt> is PARENT ELEMENT. <title> is the CHILD ELEMENT for <titleStmt>
  • 16.
    • <biblStruct> <titleStmt> <title level=&quot;m&quot;>Early history of the Cleveland Public Schools</title> <author><persName>Freese, Andrew</persName></author> </titleStmt> <extent>128 p. : ill. ; 23 cm.</extent> <publicationStmt> <!-- groups information concerning publisher, place of publication, and date of the text --> <pubPlace>Cleveland, Ohio</pubPlace>
    • <publisher>Robison, Savage &amp; Co., Book Printers</publisher> <date>1876</date> <!-- contains a date in any format, with normalized value in the value attribute, of bibliographic item's original publication --> </publicationStmt> <notesStmt> <note>by Andrew Freese ; Published by order of the Board of Education.</note> </notesStmt>
    • </biblStruct>
    Quick Example
  • 17.
    • A valid TEI document follows the rules of a schema that describes it.
    • The Schema (or DTD) ensures that all required elements are present in the document
    • The schema may prevent undefined elements from being used
    • The schema may enforce a specific data structure
    • The schema may specify the use of attributes and define their possible values
    • The schema may define default values for attributes
    • An XML document can be well-formed but NOT valid
    • An XML document can never be valid without being well-formed
    Validity
  • 18. Schema Examples
    • <book measure=“centimeters”>21</book>  <xs:element name=“book&quot;> <xs:complexType> <xs:simpleContent> <xs:extension base=“xs:string”>
    • <xs:attribute name=“measure” type=“xs:string” />
    • </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <book bookISBN=“152-32-29359535”>Go Tell It on the Mountain</book>
    • <authorLastName>Baldwin</authorLastName>
    • <authorFirstName>James</authorFirstName> 
    • <xs:element name=“book&quot;> <xs:complexType> <xs:sequence> <xs:element ref=“authorLastName&quot; /> <xs:element ref=“authorFirstName&quot; /> </xs:sequence> <xs:attribute ref=“bookISBN&quot; use=&quot;required&quot; /> </xs:complexType> </xs:element>
  • 19. PART 3: Levels of TEI Encoding
  • 20.
    • Latest iteration of TEI is Protocol 5 (a.k.a. P5)
    • Current TEI Consortium Best Practices Group (formed in Summer 2008) has been establishing best practices and standards for:
      • TEI headers
      • Level One encoding: Fully Automated Conversion and Encoding
      • Level Two: Minimal Encoding
      • Level Three: Simple Analysis
      • Level Four: Basic Content Analysis
      • Level Five: Scholarly Encoding Projects
    • The BPG will present its work at the Digital Library Federation conference in early May, get feedback, and publish a final document later in 2009
    Five Levels
  • 21. Level 1 Encoding: Fully Automated Conversion and Encoding
    • To create electronic text with the primary purpose of keyword searching and linking to page images
    • The text is subordinate to the page image, and is not intended to stand alone as an electronic text (without page images).
    • Most suitable for:
      • A large volume of material to be made available online quickly
      • When a digital image of each page is desired
      • No manual intervention is performed in the text creation process
      • material is of interest to a large community of users who wish to read texts that allow keyword searching
      • sophisticated search and display capabilities based on the structure of the text are not necessary
  • 22. Level 1 Encoding: Characteristics <div1> or <div> There should be only one child of <body>: a single <div> (or <div1>) <ab> There should be only one child of the <div> (or <div1>): a single <ab> wrapping all text OCR text. If the text is ever “upgraded” to a Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>. <pb> Required in Level 1. Page images can be linked to the text by specifying a jpeg or other image file as the value of the facs= attribute. Page numbers can be supplied with the n= attribute to record the number that is on the page. The Task Force sees the use of METS here as having a tremendous advantage. METS/TEI page turning documentation will be included in the near future.
  • 23. Level 2 Encoding: Minimal Encoding
    • To create electronic text for full-text searching, linking to page images, and identifying simple structural hierarchy to improve navigation. (For example, you can create a table of contents from such encoding.)
    • The text is mainly subordinate to the page image, though navigational markers (textual divisions, headings) are captured. However, the text could stand alone as electronic text (without page images)
    • Requires some human intervention to identify each textual division and heading.
    • Most suitable for:
      • A large volume of material to be made available online quickly
      • When a digital image of each page is desired
      • material is of interest to a large community of users who wish to read texts that allow keyword searching
      • Rudimentary search and display capabilities based on the large structures of the text are desired
      • Each text is checked to ensure that divisions and headers are properly identified
  • 24. Level 2 Encoding: Characteristics All elements specified in Level 1 plus the following: <front>, <back> Optional <div1> or <div> If no type= attribute is specified, a type= value of &quot;section&quot; should be presumed. <head> Required if present. <ab> At least one container element is required. <fw> Running heads; can be automatically generated
  • 25.
    • <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>
    • <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>  <teiHeader type=&quot;text&quot;> [stuff] </teiHeader>  <text>    <front>       [title page information, table of contents, prefaces, etc.]      [optional]    </front>    <body>      <div type=&quot;section&quot;>        <pb n=&quot;1&quot; facs=&quot; [URI of page 1 image] &quot;/>        <head> [heading of section 1] </head>        <ab> [entire contents of section 1 here, with           interspersed <pb /> elements pointing to page           images; in this example there are 26 more pages           to section 1] </ab>      </div>      <div type=&quot;section&quot;>        <pb n=&quot;27&quot; facs=&quot; [URI of page 27 image] &quot;/>        <div type=&quot;subsection&quot;>          <head >[heading of section 2 subsection 1]</head>          <ab>[all the paragraphs of subsection one go here            with page breaks inserted] </ab>        </div>      </div>    </body>    <back> [optional] </back>  </text> </TEI>
    P5 Level 2 Encoding Template
  • 26.
    • <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>
    • <TEI xml:id=&quot;someid&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>
    • <teiHeader> [Source and processing information goes here] </teiHeader>
    • <text>
    • <body>
    • <div1>
    • <pb n=&quot;113&quot; facs=&quot; 00000001.tif &quot;/>
    • <head>POINT VIII: BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; ALTERNATIVELY, DISCOVERY AND A HEARING SHOULD BE ORDERED.</head>
    • <ab> POINT VIII. BECAUSE OF UNLAWFUL SURVEILLANCE, PETITIONER'S CONVICTION SHOULD BE VACATED; …
    • <pb n=&quot;114&quot; facs=&quot;00000002.tif&quot;/> on the Hiss mail in 1945, …
    • <pb n=&quot;115&quot; facs=&quot;00000003.tif&quot;/> occurred from December 13, 1945 until the Hisses moved from Washington, D.C. to New York City on September 13, 1947. …
    • </ab>
    • </div1>
    • </body>
    • </text>
    • </TEI>
    P5 Level 2 Encoding Example
  • 27. Level 3 Encoding: Simple Analysis
    • To create text that can stand alone as electronic text
    • Identifies hierarchy (logical structure) and typography without content analysis being of primary importance
    • Features to be encoded are determined by the logical structure and appearance of the text
    • can stand alone as text without page images
    • Most suitable for:
      • Some sophistication of display, delivery, and searching based on structure of the text is desired
      • Texts will be checked to ensure that encoding decisions have been made appropriately
      • material is of interest to a large community of users who wish to read texts that allow keyword searching
  • 28. Level 3 Encoding: Characteristics All elements specified in Levels 1 and 2 plus the following : <front>, <back> Required if present <div> Required if present; type attribute is recommended <floatingText> Recommended if present. <p> Required for paragraph breaks in prose. <lg> and <l> Required for identifying groups of lines and lines, respectively <list> and <item> May be used in this level to indicate ordered and unordered list structures <table>, <row>, and <cell> May be used to indicate table structures. <figure> Required to indicate figures other than page images <hi> Required to indicate changes in typeface; rend attribute is optional <note> All notes must be encoded. It is also recommended that notes that extend beyond one page be combined into one <note> element. Marginal notes, without reference, should occur at the beginning of the paragraph to which they refer, with the value of the place attribute as &quot;margin&quot;
  • 29. Level 3 Encoding: General Recommendations
    • Front matter
      • <div type=&quot;contents&quot;> : Use lists to mark up the table of contents with the <ptr> tag used to reference the starting page number. The <ptr> tag can reference the <pb> identifier OR an identifier (e.g., @xml:id) placed in the corresponding division of text.
    • Body
      • <note> Inline. The note is inserted at the point of reference. An n attribute records the value of the note reference if there is one
    • Back
      • <div type=&quot;index&quot;> : Use lists to mark up index entries with the <ref> tag used to reference the corresponding page number. Add the &quot;target&quot; attribute (@target) to reference the <pb> identifier to generate links from the index into the text proper
    • Running heads, catch words, and other such forme work information should NOT be included in Level 3, with the exception of page numbers, which are recorded using pb
  • 30. Level 3 Encoding: Prose Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;> <teiHeader> [stuff] </teiHeader> <text> <front> <div type=&quot;frontispiece&quot;>[figure]</div1> <titlePage>[text]</titlePage> <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front> <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;chapter“> <pb n=“5” xml:id=“freear-p03” />[text] </div2> <div type=&quot;chapter&quot;> <pb n=“12” xml:id=“freear-p12” />[text] </div2> <div type=&quot;chapter&quot;>[text]</div2> </div> </body> <back> <div type=&quot;appendix&quot;>[text]</div1> <div type=&quot;index&quot;>[text]</div1> </back> </text></TEI> Table of Contents: <!--@target references page break identifier--> <div type=&quot;contents&quot;> <head>CONTENTS</head> <list type=&quot;simple&quot;> <item>I. A Boy and His Dog <hi rend=&quot;right&quot;>3</hi> <ptr target=&quot;#freear-p03&quot;/> </item> <item>II. Romance <hi rend=&quot;right&quot;>12</hi> <ptr target=&quot;#freear-p12&quot;/> </item> </div>
  • 31. Level 3 Encoding: Verse Example <TEI xmlns=&quot;http://www.tei-c.org/ns/1.0&quot; xml:id=&quot;VAA2383&quot;> <teiHeader> [stuff] </teiHeader> <text> <front> <titlePage>[text]</titlePage> <div type=&quot;dedication&quot;>[text]</div1> <div type=&quot;contents&quot;>[text]</div1> </front> <body> <div type=&quot;book&quot;> <head>[book title]</head> <div type=&quot;part&quot;> <head>[section title]</head> <div type=&quot;poem&quot;> <head>THE DAYS GONE BY.</head> <lg> <l n=&quot;1&quot;>O the days gone by! O the days gone by!</l> <l n=&quot;2&quot;>The apples in the orchard, and the pathway through the rye;</l> <l n=&quot;3&quot;>The chirrup of the robin, and the whistle of the quail</l> <l n=&quot;4&quot;>As he piped across the meadows sweet as any nightingale;</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </div> </div> </div> </body> </text> </TEI>
  • 32. Level 4 Encoding: Basic Content Analysis
    • To create text that can stand alone as electronic text
    • identifies hierarchy and typography
    • specifies function of textual and structural elements
    • describes the nature of the content and not merely its appearance.
    • Features of the text that may contribute to meaning, such as indentation of verse lines and typographic change, are preserved
    • Most suitable for:
      • sophisticated search and retrieval capabilities are desired
      • texts will be used for textual analysis
      • users of the texts may have limited storage or display capabilities
  • 33. Level 4 Encoding: Characteristics All elements specified in Levels 1, 2 and 3 plus the following : Et cetera; see TEI BPG Guidelines <titlePage> and child elements Required if present <group> Required to encode a collection of independent texts that are regarded as a single group for processing or other purposes <emph>, <foreign>, <gloss>, <term>, or <title> Recommended to identify typographically distinct text <epigraph>, <quote>, <said>, <mentioned>, or <soCalled> Recommended to represent speech, thought, quotation, etc. <sic>, <corr>, or <choice> Recommended to encode errors or typos. <add>, <del>, <gap>, and <unclear> Recommended to encode material that is omitted, added, marked for deletion, or is illegible, invisible, or inaudible <opener>, <dateline>, <salute> <closer>, <signed>, <postscript> Required to indicate specific parts of letters <sp>, <speaker>, and <stage> Required to encode different dramatic structures. <sp> and <speaker> Required to encode oral histories interviews
  • 34.
    • <p> But it is well authenticated by the observation of every one, that <del rend=&quot;overstrike&quot; hand=&quot;JHL&quot;> their manner </del> <add rend=&quot;sup&quot; hand=&quot;JHL&quot;> this way—i.e. the above </add> of writing influences the style of compos. of those who practise it considerably, when they grow up to years of manhood; for their productions, <del hand=&quot;JHL&quot; rend=&quot;overstrike&quot;> instead </del> far from being terse, argumentative, convincing, are without head or tail & are generally an incongruous mass mixed up in the most disgusting manner, without divisions or heads & in short without a subject (so to speak). </p>
    Example of Level 4 Encoding
  • 35. Level 5 Encoding: Scholarly Encoding Projects
    • Level 5 texts are those that require subject knowledge, and encode semantic, linguistic, prosodic, or other elements beyond a basic structural level
  • 36.
    • <l>So hath myn
    • <app>
    • <lem wit=“#msB #msC”>herte</lem>
    • <rdg wit=“#msA”>hert</rdg>
    • <rdg wit=“#msD”>minde</rdg>
    • <rdg wit=“#msE>mynde</rdg>
    • </app>
    • Caught in remembraunce</l>
    Example: Variant Readings in Level 5 Apparatus; critical apparatus Lemma, or base text
  • 37. General Recommendations
    • An encoding project should strive for internal consistency and for use of standards so that the data can be modified or enhanced in the future with ease
    • When reformatting to digital media using any level of encoding, the electronic text should begin with the transcription of the first word on the first leaf of the original work
    • Certain features of the text, such as publisher's advertisements or indexes, should be included as links to page images
    • Any omissions of material found in the original work should be noted in the <editorialDecl> in the TEI header
    • An encoding project should use only numbered divisions (i.e., <div1>, <div2>, etc.) or unnumbered divisions (i.e., <div>) but not both
    • Whether numbered or unnumbered divisions are used, the @type attribute of the division element is not recommended at level 1, is optional at level 2, is recommended at level 3, and required at levels 4 and 5
    • Page breaks should be encoded using the <pb> element, which should demark the top of a page (i.e. the text of page seven should immediately follow <pb n=&quot;7&quot;/>), and should always be contained within a div for ease of retrieval with indexing software
  • 38. PART 4: Short Practice in Text Encoding
  • 39.
    • Author: James Wallen
    • Title: Cleveland’s Golden Story
    • Publishing Place and Publisher : [Cleveland, OH]: Wm. Taylor Son & Co.
    • Year : 1920
    • 93 pp.
    • CONTENTS
    TEXT ETC . Chapter 1. The Kingdom of God. 1 Chapter 2. Lincoln-Hearted Men 9 Chapter 3. Taming the Wilderness 19
  • 40.
    • Chapter 1: The Kingdom of Gold
    • Gold is the symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.”
    • His catalog of iritations included:
    • 1. The weather
    • 2. The peacocks
    • 3. His meagre grasp of Hamlet, Prince of Denmark
    Chapter Heading and Paragraph
  • 41.
    • <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>
    • <teiHeader type=&quot;text&quot;> [stuff goes here] </teiHeader>
    • <text>
    • <front>
    • <list>
    • <item> Chapter 1. The Kingdom of God. 1 </item>
    • <item> Chapter 2. Lincoln-Hearted Men. 9 </item> [ETC.]
    • </list>
    • </front>
    • <body>
    • <div type=“section&quot;>
    • <pb n=“1&quot; facs=&quot;p1.jpg&quot;/>
    • <head> The Kingdom of God </head>
    • <ab>
    • [a whole section is contained within this anonymous block tag; interspersed with <pb> elements pointing to page images] <pb xml:id=&quot;p21198-zz0002mpwb&quot; n=&quot;2&quot;/>
    • </ab>
    • </div>
    • </body>
    • <back> [optional] </back>
    • </text>
    • </TEI>
    P5 Level 2 Encoding
  • 42.
    • <?oxygen RNGSchema=&quot;http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng&quot; type=&quot;xml&quot;?> <TEI xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xmlns=&quot;http://www.tei-c.org/ns/1.0&quot;>
    • <teiHeader type=&quot;text&quot;>[stuff goes here]</teiHeader>
    • <text>
    • <front>
    • <pb n=“1” xml:id=“walcle01-00” />
    • <div type=“contents”>
    • <list>
    • <item>Chapter 1. The Kingdom of God. <hi rend=&quot;right&quot;> 1 </hi> <ptr target=“#walcle01-p1”/> </item>
    • <item>Chapter 2. Lincoln-Hearted Men. <hi rend=&quot;right&quot;> 9 </hi><ptr target=“# walcle01-p1”/> </item> [ETC.]
    • </list>
    • </div>
    • </front>
    • <body>
    • <div type=“ chapter &quot;>
    • <pb n=“1&quot; xml:id=&quot;walcle01-p1&quot;/>
    • <head type=“main”> Chapter 1 </head>
    • <head type=“subtitle”> The Kingdom of God </head>
    • <p> [FIRST PARAGRAPH GOES HERE] </p>
    • </div>
    • </body>
    • <back> [optional] </back> </text></TEI>
    P5 Level 3 Encoding
  • 43.
    • <p> Gold is the symbol of adventure—the unresting urge that stirs men’s souls. Francois de Orlenna, who crossed the South American continent from ocean to ocean in 1540, wrote, “Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold.” </p>
    • <p> His catalog of iritations included:
    • <list>
    • <item> 1. The weather </item>
    • <item> 2. The peacocks </item>
    • <item> 3. His meagre grasp of <hi> Hamlet, Prince of Denmark </hi> </item>
    • </list>
    • </p>
    P5 Level 3 Continued
  • 44.
    • <p>Gold is the symbol of adventure—the unresting urge that stirs men’s souls. <name type=“person” key=“FDO1”> Francois de Orlenna </name> , who crossed the South American continent from ocean to ocean in <date when=“1540”> 1540 </date> , wrote, <q> Having eaten our boots and saddles, boiled with a few wild herbs, we set out to reach the Kingdom of Gold. </q> </p>
    • <p>His catalog of <sic> iritations <sic><corr> irritations </corr> included:
    • <list>
    • <item> 1. The weather</item>
    • <item> 2. The peacocks </item>
    • <item> 3. His meagre grasp of <hi> <bibl><title ref=“hamlet1”> Hamlet, Prince of Denmark </title></bibl> </hi> </item>
    • </list>
    • </p>
    • <bibStruct xml:id=“hamlet1”>
    • <monogr>
    • <author> Shakespeare, William </author>
    • <title> Hamlet, Prince of Denmark </title>
    • <date>
    • </monogr> </bibStruct>
    P5 Level 4 Encoding
  • 45. PART 5: TEI Header
  • 46.
    • Provides administrative, descriptive, and preservation metadata
      • Administrative : who created the metadata? When was it created? Where is the original item located? Etc.
      • Descriptive : title, author, publication info, subject headings, number of pages, etc.
      • Preservation : file size, identifier, format, etc.
    TEI Header
  • 47.
    • Electronic Version Information
      • Information about the ELECTRONIC version of the work(s)
    • Electronic Distributor Information
      • Information about the publisher of the ELECTRONIC version of the work(s)
      • E.g. William Taylor & Co. published the original work, but Kelvin Smith Library is publishing the electronic version
    • Original Document Bibliographic Information
      • Bibliographic information of the text from which the electronic version was derived. May be generated from MARC record (but does not have to be).
    • Encoding Description
      • Includes project description, encoding level declaration, what classification structure is used (e.g. LCSH), etc.
    • Profile Description
      • Includes text language, subject terms
    • Revision Description
      • If any revision was done to the TEI document, this is where that information is recorded, included revision details, party(ies) involved, and date(s)
    Basic Components of TEI Header
  • 48.
    • Can reflect a text center’s standards, serve as the basis for other types of metadata system records,
    • Can function in detached form as records in a catalog, as a title page inherent to the document, or as a source for index displays
    • May describe a collection of documents, a single item, or a portion of an item
    • A TEI header may NOT necessarily have a one to one correspondence with a MARC record. One TEI header may have multiple MARC analytic records, or one MARC record may be used to describe a collection of TEI documents with individual headers
    • May contain an historical background on how the file has been treated and extend the information of a classic catalog record
    • There is no ONE header template. Modification needed depending on project, text type.
    TEI Header (continued)
  • 49. Example: MARC to TEI Header
    • LEADER 00000nam 2200000Ia 4500
    • 001 49237829
    • 003 OCoLC
    • 005 20020305071435.0
    • 008 020305s1905 ohu r 000 0 eng d
    • 040 CWR|cCWR
    • 049 CWRR
    • 090 BJ1161|b.G6 1905a
    • 100 1 Given, Charles Stewart
    • 245 12 A fleece of gold :|bfive lessons from the fable of Jason and the Golden Fleece /|cby Charles Stewart Given
    • 260 Cincinnati [Ohio] :|bJennings and Graham ;|aNew York [N.Y.] :|bEaton and Mains,|cc1905
    • 300 103 p. ;|c18 cm
    • 533 Photocopy.|bLaCrosse, Wis. :|cBrookhaven Press : digital production by Northern Micrographics, Inc.,|d2001.|e18 cm
    • 650 0 Success
    • 650 0 Conduct of life
    • 650 0 Jason (Greek mythology)
    • <sourceDesc> <biblStruct> <titleStmt> <title type=&quot;main&quot;>A fleece of gold</title> <title type=&quot;sub&quot;>five lessons from the fable of Jason and the Golden Fleece</title> <!-- subheading [if applicable] --> <author>
    • <persName>Given, Charles Stewart
    • <persName>
    • </author> </titleStmt> <extent>103 p.</extent> <publicationStmt> <!-- groups information concerning publisher, place of publication, and date of the text --> <pubPlace>Cincinnati [Ohio]</pubPlace> <publisher>Jennings and Graham</publisher> <date>1905</date> <idno>BJ1161 .G6 1905a</idno> </publicationStmt> </biblStruct> </sourceDesc>
    • <profileDesc> <keywords scheme=&quot;LCSH&quot;> <!-- if the keywords come from a controlled vocabulary, it can be identified by the scheme attribute --> <term>Success</term> <term>Conduct of life</term> <term>Jason (Greek mythology)</term> </keywords> </textClass> </profileDesc>
  • 50. Session 2: Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
  • 51. PART 6: Basic Text Type Structures
  • 52.
    • Before encoding a text, skim through it and mark out its structure.
    • For books, take note of:
      • Volumes
      • Parts
      • Chapters
      • Section breaks
      • Table of contents
      • End matter (indices, endnotes, bibliography, glossary, appendices, colophon, etc.)
      • Front matter (title page, dedication, epigraph, preface, introduction, etc.)
    Introduction
  • 53. Sample Book Structure This book has: A. An Introduction, B. Two chapters – each with a heading and two sections – and C. an Index
  • 54. Book: Tree Diagram
  • 55. Basic Document Structure <text> <front> <div type=&quot;preface&quot;> <!-- ... --> </div> <div type=&quot;introduction&quot;> <!-- ... --> </div> </front> <body> <div type=“chapter” n=“1”> <heading type =“main”>Chapter 1</heading> <heading type=“subtitle”>Wines</heading> <div type=“section”>White wines ... </section> <div type=“section”>Red wines ... </section> </div> </body> <back> <div type=&quot;index&quot;> <!-- … --> </div> </back> </text> Un-numbered Division
  • 56. Numbered Divisions within Body <body> <div1 type=&quot;part&quot; n=&quot;1“> <div2 type=&quot;chapter&quot; n=&quot;1“> <!-- text of part 1, chapter 1 --> </div2> </div1> <div1 type=&quot;part&quot; n=&quot;2“> <div2 type=&quot;chapter&quot; n=&quot;2“> <!-- text of part 2, chapter 2 --> </div2> </div1> </body>
    • The largest possible subdivision of the body is <div1> and the smallest possible is <div7>.
    • You CANNOT mix unnumbered and numbered divisions (i.e. <div> with <div1> etc.)
    • Regardless, a good practice is to use n= and/or xml:id= attributes. For example:
    • <div type=“chapter” n=“1”>  NOTE: n= is optional, and repeatable
  • 57.
    • <text>
    • <front><!-- ... --></front>
    • <group>
    • <text><!-- ... --></text>
    • <text><!-- ... --></text>
    • <text><!-- ... --></text>
    • </group>
    • <back><!-- ... --></back>
    • </text>
    Using <group> for Editions
  • 58.
    • <div1 n=“I” type=“Act”>
    • <head> Act I </head>
    • <div2 type=&quot;scene&quot;>
    • <head> Scene 1 </head>
    • <stage type=&quot;entrance&quot;> Enter Fay </stage>
    • <sp>
    • <speaker> Fay </speaker>
    • <p> I say, Dinah, has anyone seen my gloves? </p>
    • </sp>
    • <stage type=&quot;entrance&quot;> Enter Dinah </stage>
    • <sp>
    • <speaker> Dinah </speaker>
    • <p> No, miss, perhaps the parakeet has got them again? </p>
    • </sp>
    • <stage type=&quot;exit&quot;> Exit Fay and Dinah </stage>
    • </div2>
    • </div1>
    Drama
  • 59.
    • <div type=&quot;letter&quot;>
    • <opener>
    • <dateline>
    • <date when=&quot;1865-08-05&quot;> August the 5th </date>
    • <name type=&quot;place&quot;> Cape Cod </name>
    • </dateline>
    • <salute> My dear <name type=&quot;person&quot;> Becky </name></salute>
    • </opener>
    • <p> How lovely the oysters are this evening! </p>
    • <closer>
    • <salute> Yours very truly </salute>
    • <signed><name type=&quot;person&quot;> Maria </name></signed>
    • </closer>
    • </div>
    Letters
  • 60.
    • <lg type=&quot;sonnet&quot;>
    • <head>On First Looking into Chapman’s Homer</head>
    • <lg type=&quot;quatrain&quot;>
    • <l>Much have I travell’d in the realms of gold,</l>
    • <l>And many goodly states and kingdoms seen;</l>
    • <l>Round many western islands have I been</l>
    • <l>Which bards in fealty to <persName>Apollo</persName>
    • hold.</l>
    • </lg>
    • <lg type=&quot;quatrain&quot;>
    • <l>Oft of one wide expanse had I been told</l>
    • <l>That deep-brow’d <persName>Homer</persName> ruled as
    • his demesne;</l>
    • <l>Yet did I never breathe its pure serene</l>
    • <l>Till I heard <persName>Chapman</persName> speak out
    • loud and bold:</l>
    • </lg>
    • <lg type=&quot;sestet&quot;>
    • <l>Then felt I like some watcher of the skies</l>
    • <l>When a new planet swims into his ken;</l>
    • <l>Or like stout <persName>Cortez</persName> when
    • with eagle eyes</l>
    • <l>He star’d at the <placeName>Pacific</placeName>—and
    • all his men</l>
    • <l>Look’d at each other with a wild surmise&mdash;</l>
    • <l>Silent, upon a peak in <placeName>Darien</placeName>.</l>
    • </lg>
    • </lg>
    Verse <persName> and <placeName> are optional for Level 3; Used in Level 4.
  • 61.
    • Warp Speed, Ms. Bright!
    • There was a young lady named Bright,
    • Who travelled much faster than light,
    • She departed one day,
    • In a relative way way,
    • And returned on the previous night.
    Verse Example
  • 62.
    • <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>
    • <lg type=&quot;limerick&quot; rhyme=&quot;aabba&quot; n=&quot;3&quot;>
    • <head>Warp Speed, Ms Bright!</head>
    • <l>There was a young lady named <rhyme label=&quot;a&quot;>Bright</rhyme>,</l>
    • <l>Who travelled much faster than <rhyme label=&quot;a&quot;>light</rhyme>,</l>
    • <l>She departed one <rhyme label=&quot;b&quot;>day</rhyme>,</l>
    • <l>In a relative <rhyme label=&quot;b&quot;>way</rhyme>,</l>
    • <l>And returned on the previous <rhyme label=&quot;a&quot;>night</rhyme>.</l>
    • </lg>
    TEI P5 Level 5 Rendering
  • 63. PART 7: Some Common Practices in Text Encoding
  • 64.
    • Make sure you are linked to a schema
    • <?oxygen RNGSchema=&quot; http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng &quot; type=“xml&quot;?>
    • 2. Include as much of the header elements as possible – including <editorialDecl>
    • <editorialDecl> <hyphenation eol=&quot;none&quot;> <p>Hyphenated words that appear at the end of a line have been removed</p> </hyphenation>
    • </editorialDecl>
    • 3. Make use of spell-check: F4 key
    • 4. Delete hyphens within all words – except in special cases (e.g. poetry, dramatic verse)
    • Institu-tion becomes Institution
    Using oXygen
  • 65.
    • 5. Include <respStmt> in header for any emendations or corrections you later make to a text that has been previously encoded
    • <respStmt> <name xml:id=&quot;rlw54&quot;>Richard Wisneski</name>
    • <!-- one resp per respStmt --> <resp>TEI Header creator</resp>
    • <!-- OR TEI Header and document creator --> </respStmt>
    • 6. Page breaks must be inserted, using <pb n=“[page number]” xml:id=“” />
    • <pb xml:id=&quot;fleboo-032&quot; n=&quot;33&quot;/>
    • OR, if one desires to have page reference the specific page image:
    • <pb facs=“fleboo-032.jp2“ n=&quot;33&quot;/>
    • xml:id attribute MUST be (a) unique and (b) start with a letter
    • facs attribute MAY link to a permanent URL or URI
    Common Practices (continued)
  • 66.
    • As part of a page:
    • <figure>
    • <figDesc>An illuminated page from the de Brailes Hours, containing a historiated initial with a signed self-portrait of William de Brailes
    • </figDesc>
    • <graphic url=&quot;./gfx/debrailes_ms.jpg&quot; height=&quot;600px&quot;/>
    • </figure>
    Inserting Images <figDesc> is not required in Level 3, but we are using it to capture either an image caption or to describe the image if a caption is not present
  • 67. Footnotes
  • 68.
    • <p>FIRST ANNUAL REPORT<note place=&quot;foot&quot; rend=&quot;*&quot;>This was the first report made after the schools were regularly organized under the ordinance. The Bethel School mentioned in the opening paragraph had existed through a part of the previous year, the year 1836, and a Board appointed to look after its interests, had made an informal— probably oral — report.</note> OF THE- BOARD OF MANAGERS OF COMMON SCHOOLS.</p>
    Footnote Encoded (and Marginalia) If this note were in the MARGIN of the page, it would be encoded, for example: <note type=“auth” place=“margin-left”> text, text,text </note> Type= and rend= attributes are optional
  • 69.
    • In the body of the text:
    • <p>He liked to eat pie<ptr target=&quot;#note1&quot; rend=&quot;1&quot;/>.</p>
    • OR…
    • <p>See <ref target=&quot;#note1&quot;>Note 30</ref></p>
    • At the end of the text:
    • <div type=&quot;endnotes&quot;> <pb n=&quot;130&quot; /> <p xml:id=“note1&quot;>Pie is a dessert often eaten.</p> </div>
    Endnotes
  • 70. Another Method
  • 71.
    • <l>(Diff'rent our parties, but with equal grace</l>
    • <l>The Goddess smiles on Whig and Tory race,
    • <ptr rend=&quot;unmarked&quot; target=&quot;#note3.284&quot;/>
    • </l>
    • <l>'Tis the same rope at sev'ral ends they twist,</l>
    • <l>To Dulness, Ridpath is as dear as Mist)</l>
    • <note xml:id=&quot;note3.284&quot; type=&quot;imitation&quot; place=&quot;foot&quot; anchored=&quot;false&quot;> <bibl>Virg. Æn. 10.</bibl> <quote> <l>Tros Rutulusve fuat; nullo discrimine habebo.</l> <l>—— Rex Jupiter omnibus idem.</l> </quote>
    • </note>
    Encode with Pointer and Link
  • 72. PART 8: Encoding References
  • 73.
    • Definition : Things we know about the content of the text that we want to be able to state explicitly to add value to the text or assist the reader in understanding it better, such as:
      • Authority control: information about the identity of things named in the text: people, places, books, etc.
      • Additional information about: birthdates, geographical locations, date published, etc.
      • Interpretive information: themes, keywords
      • Normalization of measurements, dates, etc.
    Encoding Contextual Information
  • 74.
    • Names
    • <persName>Baron Olivier of Brighton</persName>
    • <placeName>New York</placeName>
    • <orgName>Podunk Sewing Club</orgName>
    • Linguistic: <foreign>, <distinct>, <soCalled>, <mentioned>, <term>, <emph>
    • <distinct>dinna ken</distinct> why that
    • <foreign xml:lang=&quot;fr&quot;>soi-disant</foreign> <soCalled>expert</soCalled>
    • must be <emph>so</emph> particular about pronouncing
    • <mentioned xml:lang=&quot;cy&quot;>Llandaff</mentioned> using a
    • <term>voiceless lateral fricative</term>
    Common Tags for Contextual Information
  • 75.
    • ’ Ographies:
      • prosopography (personography)
      • gazetteers (placeography)
      • orgography, bibliography
    • These are like local authority lists that you create
    • Keywords applied to the text as a whole
    • Thematic or interpretive information applied to specific places in the text
    Types of Contextual Information
  • 76.
    • Like a local name authority file
    • Can be simple or very detailed
    • Can be kept in your encoded file or externally
    • Includes specific elements for the most common data
    • Also includes general elements for the unforeseen
    Personography
  • 77.
    • <teiHeader>
    • <!-- ... -->
    • <particDesc>
    • <listPerson>
    • <person xml:id=&quot;andrew_j_steere&quot;>
    • <persName>Steere, Andrew J.</persName>
    • <birth when=&quot;1844&quot;>
    • <placeName ref=&quot;#l_scituate&quot;>Scituate, RI</placeName>
    • </birth>
    • <death notBefore=&quot;1918&quot;/>
    • </person>
    • <person xml:id=&quot;george_pope_morris&quot;>
    • <persName>Morris, George Pope</persName>
    • <birth when=&quot;1802&quot;>
    • <placeName>Philadelphia, PA</placeName>
    • </birth>
    • <death when=&quot;1864&quot;/>
    • </person>
    • </listPerson>
    • </particDesc>
    • </teiHeader>
    • <text>
    • <p>...However, the plea of Woodman spare that tree and the patriotic pride
    • of the owner, <persName ref=&quot;#andrew_j_steere&quot;>Mr. Andrew J. Steere</persName>,
    • had guaranteed its safety from the woodsman’s axe.
    • </p>
    • </text>
    Personography Encoding TEI header Participation description listPerson person
  • 78.
    • Very similar to personography...but for places
    • Can be linked to maps via geographic information data
    Placeography (Gazetteer)
  • 79.
    • <body>
    • <p>The tree stood about a mile east of <placeName ref=&quot;#l_chepachet&quot;>Chepachet</placeName> and a mile north of <placeName ref=&quot;#l_spring_grove&quot;>Spring
    • Grove</placeName> ... </p>
    • <!-- ... -->
    • </body>
    • <back>
    • <div type=&quot;editorial&quot;>
    • <listPlace>
    • <place type=&quot;state&quot; xml:id=&quot;l_rhode_island&quot;>
    • <placeName>The State of Rhode Island and Providence Plantations</placeName>
    • <country>United States of America</country>
    • <region>New England</region>
    • </place>
    • <place type=&quot;settlement&quot; xml:id=&quot;l_chepachet&quot;>
    • <placeName>Chepachet</placeName>
    • <region ref=&quot;#l_rhode_island&quot;/>
    • <location>
    • <geo>41.915131 -71.671397</geo>
    • </location>
    • </place>
    • <place type=&quot;settlement&quot; xml:id=&quot;l_spring_grove&quot;>
    • <placeName>Spring Grove</placeName>
    • <region ref=&quot;#l_rhode_island&quot;/>
    • <location>
    • <geo>41.905583 -71.656219</geo>
    • </location>
    • </place>
    • </listPlace>
    • </div>
    • </back>
    Placeography Encoding back div place
  • 80.
    • To associate a keyword or interpretive concept with a word, phrase, or passage of text:
    • <body>
    • <div type=&quot;section&quot;>
    • <p>However, the plea of <quote>Woodman spare that tree</quote> and the <seg ana=&quot;#patriotism&quot;>patriotic pride of the owner</seg>, <persName>Mr. Andrew J. Steere</persName>, had <seg ana=&quot;#conservation&quot;>guaranteed its safety from
    • the woodsman’s axe</seg>...</p>
    • </div>
    • </body>
    • <back>
    • <div type=&quot;editorial&quot;>
    • <interpGrp>
    • <interp xml:id=&quot;ri_history&quot;>Rhode Island local history</interp>
    • <interp xml:id=&quot;patriotism&quot;>Patriotism and references to the war effort</interp>
    • <interp xml:id=&quot;commercial&quot;>References to commercial harvesting and use of
    • trees</interp>
    • <interp xml:id=&quot;conservation&quot;>Conservation efforts and protection of trees</interp>
    • <interp xml:id=&quot;arboriculture&quot;>References to tree species and their cultivation</interp>
    • </interpGrp>
    • </div>
    • </back>
    Interpretative Keywords and Themes
  • 81. PART 9: Some Complex Situations in Text Encoding
  • 82.
    • Overlapping occurs especially with older texts.
    • XML elements may not overlap, but document structures often do.
    • Examples include:
      • Physical features like pages, columns, and lines, and textual things like paragraphs or names
      • Verse lines and quotations, names, other phrasal elements
      • Verse lines and linguistic features
      • Dramatic speeches and verse lines
      • Handwritten additions or deletions and other structures
      • Typographical features and linguistic features
    Overlapping
  • 83.
    • Mortal, she said, &quot;I'm sent to you,
    • Then hold my precepts fast;
    • Remember earth’s best joys are few,
    • And can’t for ever last.&quot;
    Example
  • 84.
    • <lg type=&quot;stanza&quot;>
    • <l>Mortal, she said, <said xml:id=&quot;s01&quot; next=&quot;#s02&quot;>I'm sent to you,</said></l>
    • <l><said xml:id=&quot;s02&quot; next=&quot;#s03&quot; prev=&quot;#s01&quot;>Then hold my precepts fast;</said></l>
    • <l><said xml:id=&quot;s03&quot; prev=&quot;#s02&quot; next=&quot;#s04&quot;>Remember earth’s best joys are few,</said></l>
    • <l><said xml:id=&quot;s04&quot; prev=&quot;#s03&quot;>And can’t for ever last.</said></l>
    • </lg>
    One Possible Solution
  • 85. Transcriptional Complexities <p>Johnston etc 1764 Mr Nikl<unclear>e</unclear> <supplied>s</supplied><gap reason=&quot;folded&quot; extent=&quot;unknown&quot;/> Brown <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear> to ten Rum Barels at Four pound &per; Barel — — — £40</p>
  • 86.
    • Marks a boundary point separating any kind of section of a text where the change is not represented by a structural element
    • Regarding page breaks, signatures, line breaks, and column breaks
    • Example:
    • <pb n=&quot;249&quot;/>
    • <milestone unit=&quot;sig&quot; n=&quot;R5r&quot;/>
    • <lb/>digested. Its long trunk, as seen slanting down from
    • <lb/>out of the building across the wharf and into the ship,
    • <lb/>will have been—a farthing.</p>
    • <pb n=&quot;250&quot;/>
    • <milestone unit=&quot;sig&quot; n=&quot;R5v&quot;/>
    Milestones
  • 87.
    • Available on every element when additional tagset for segmentation & alignment is used
    • To count lines, or indicate metrical patterns, etc. --> divide into parts
    • <sp>
    • <speaker>Leo.</speaker>
    • <l part=&quot;F&quot;>Go on, go on:</l>
    • <l>Thou canst not speake too much, I have deserv’d</l>
    • <l part=&quot;I&quot;>All tongues to talk their bittrest.</l>
    • </sp>
    • <sp>
    • <speaker>Lord.</speaker>
    • <l part=&quot;F&quot;>Say no more;</l>
    • <l>How ere the business goes, you have made fault</l>
    • <l part=&quot;I&quot;>I’th boldnesse of your speech.</l>
    • </sp>
    • <sp>
    • <speaker>Pauline.</speaker>
    • <l part=&quot;F&quot;>I am sorry for’t;</l>
    • <l>All faults I make, when I shall come to know them</l>
    • <!-- ... -->
    • </sp>
    Fragmentation
  • 88. Parallelism
  • 89. Encoding Parallel Structures
    • <lg type=&quot;stanza&quot; xml:lang=&quot;fr&quot;>
    • <l xml:id=&quot;fr2.01&quot; corresp=&quot;#en2.01&quot;>Nos péchés sont têtus, nos repentirs sont lâches;</l>
    • <l xml:id=&quot;fr2.02&quot; corresp=&quot;#en2.02&quot;>Nous nous faisons payer grassement nos aveux,</l>
    • <l xml:id=&quot;fr2.03&quot; corresp=&quot;#en2.04&quot;>Et nous rentrons gaiement dans le chemin bourbeux,</l>
    • <l xml:id=&quot;fr2.04&quot; corresp=&quot;#en2.03&quot;>Croyant par de vils pleurs laver toutes nos taches.</l>
    • </lg>
    • <lg type=&quot;stanza&quot; xml:lang=&quot;en&quot;>
    • <l xml:id=&quot;en2.01&quot; corresp=&quot;#fr2.01&quot;>Our sins are stubborn, craven our repentance.</l>
    • <l xml:id=&quot;en2.02&quot; corresp=&quot;#fr2.02&quot;>For our weak vows we ask excessive prices.</l>
    • <l xml:id=&quot;en2.03&quot; corresp=&quot;#fr2.04&quot;>Trusting our tears will wash away the sentence,</l>
    • <l xml:id=&quot;en2.04&quot; corresp=&quot;#fr2.03&quot;>We sneak off where the muddy road entices.</l>
    • </lg>
  • 90.
    • <p>...with them, bycause they woulde
    • <lb/>not be
    • <choice>
    • <abbr>boūde</abbr>
    • <expan>bounde</expan>
    • </choice>
    • also for an other wo[see below]
    • <lb/>mā at theyr pleasure, whom they
    • <lb/>knewe not, nor yet what matter
    • <lb/>was layed unto her charge. Not
    • <lb/>wythstandynge at the laste, after
    • <lb/>moche a do and reasonyng to and
    • <lb/>fro, they toke a bonde of them of
    • <lb/>recognisaunce for my fourth com
    • <lb/>mynge. And thus I was at the
    • <lb/>last,
    • <choice>
    • <orig>delyuered</orig>
    • <reg>delyvered</reg>
    • </choice>.
    • Written by me An
    • <lb/>ne Askewe.
    • </p>
    Textual Splitting CHOICE
  • 91. Textual Splitting (continued) … <choice> <abbr> <choice> <sic>wo<lb/>mā</sic> <corr>wo-<lb/>mā</corr> </choice> </abbr> <expan> <choice> <sic>wo<lb/>man</sic> <corr>wo-<lb/>man</corr> </choice> </expan> </choice> also for an other wo[see below] <lb/>mā at theyr pleasure, whom they You can put CHOICE inside name, or outside (nesting)
  • 92.
    • For an unclear reading, you can put multiple clear elements within choice
    • Use <unclear>, <supplied>, and <gap> element for what you can't read from the written manuscript
    • Example:
    • Brown <unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear>
    Unclear Texts
  • 93.
    • <teiHeader>
    • <fileDesc>
    • <!-- ... -->
    • <notesStmt>
    • <note>The historiated capital on folio 43r contains
    • a signed self-portrait of William de Brailes. The
    • text is richly illuminated ...</note>
    • </notesStmt>
    • <!-- ... -->
    • </fileDesc>
    • </teiHeader>
    Descriptive Prose for Images
  • 94. Formalization for Images <physDesc> <objectDesc form=&quot;codex&quot;> <supportDesc material=&quot;vellum&quot;> <extent> <dimensions> <height quantity=&quot;150&quot; unit=&quot;mm&quot;/> <width quantity=&quot;124&quot; unit=&quot;mm&quot;/> </dimensions> </extent> </supportDesc> <layoutDesc> <layout columns=&quot;1&quot; ruledLines=&quot;12&quot;/> </layoutDesc> </objectDesc> <handDesc> <handNote scribe=&quot;william_debrailes&quot; script=&quot;carolingian&quot; medium=&quot;ink&quot; scope=&quot;sole&quot;/> </handDesc> </physDesc> practice
  • 95. PART 10: TEI Guidelines
  • 96.
    • The outward expression of TEI to the user is the TEI guidelines: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html
    • These guidelines include schemas – formal rules for encoding documents – and documentation to explain how to apply these rules
    • Customization is possible to apply TEI markup as strict or as loose as one wishes
    • TEI Guidelines = Your instruction manual
    TEI Guidelines
  • 97.
    • Topically – into chapters and modules
    • Semantically and functionally – into classes
    • Pedagogically – into discussion and reference
    • TEI provides the tools. You have to ask: What is my project? What are my needs? What level of granularity do I want?
    TEI Guidelines Divisions
  • 98.
    • Some modules (e.g. TEI header) are required – core elements
    • Some modules provide genre- or discipline-specific elements : e.g., the verse module contains <rhyme> and <caesura>
    • Some modules provide functionality : e.g., the linking module contains markup for encoding arbitrary passages and noting links between them
    Chapters and Modules
  • 99.
    • Classes are functional or semantic groups of elements.
    • Two kinds:
    • model class: elements that appear in the same place in the logical structure
      • e.g., model.milestoneLike contains those milestone elements used to describe the physical and typographic structure of a codex: pb, cb, lb, and milestone
    • attribute class: elements that share a common attribute
      • e.g., att.ascribed is the class of elements that share the who attribute, i.e. can be attributed to an individual, including change, q, and sp
    Classes
  • 100.
    • Classes are composed of elements and sometimes other classes:
    • elements and attribute classes can be members of attribute classes
    • a given element (or attribute class) may be a member of more than one attribute class
    • elements, model classes, or attribute classes can be members of model classes
    • a given element (or class) may be a member of more than one model class
    Class Constituency
  • 101.
    • Datatypes are constraints on (attribute) values.
    • May limit values to things like:
    • a supplied list of possible values:
      • e.g. sex of person
    • a user-provided list of possible values:
      • e.g. type of div
    • strings that conform to a specific format:
      • e.g. when of date, extent of gap, or xml:lang
    Datatypes
  • 102.
    • contain a prose discussion of:
    • Encoding topics,
    • A reference section, and
    • Schemas that constrain encoding
    • One Document Does It All (ODD)
    TEI Guidelines…
  • 103. PART 11: Schema
  • 104. TEI Under the hood http://www.tei-c.org/Roma/
  • 105.
    • Roma , the web interface to the TEI customization mechanism, performs two functions:
    • Provides a user interface for editing TEI ODD files
    • Provides a user interface for generating TEI schemas and documentation
    ROMA, The Web Tool
  • 106.
    • Select which modules to use and not to use
    • Choose what elements in those modules to use and not to use
    • Choose what attributes on those elements to use and not to use
    • Change element or attribute names
    • Restrict the values of attributes – IMPORTANT
    • Constrain structure
    • Add new elements
    • Add new attributes
    • Produce an internationalized version of the TEI
    Customization Options
  • 107.
    • Customize:
    • Values for Type= Attributes
    • Values for any kind of information for which a controlled vocabulary exists
    • However:
    • Do NOT remove the header
    • Do NOT remove required elements from the header
    • Do NOT duplicate things which the TEI has already provided
    What To Customize and Not Customize
  • 108. Click Submit
  • 109. STEP 1: Enter a (1) title, (2) filename (all one word, no characters), (3) author name, and (4) description. Leave everything else as is. STEP 2: Click “submit” STEP 3: Click the MODULES tab MODULES tab
  • 110.
    • Do NOT remove the “List of selected Modules” given to you (but click link to change attributes)
    • Click links under “Module Name” to:
    • a. See what elements are available
    • b. Exclude elements
    • c. Rename elements
    • d. Change attributes
    3. To change values for attributes, click the attribute itself. 4. Click “add” on left-hand side to add elements with their attributes
  • 111. Clicking “add” inserts this module here. Next, click “namedates”
  • 112. You can exclude elements Click to change attributes
  • 113. Click on these links to change the values for the attributes When you’re done changing elements and attributes, click here
  • 114. Change the values for the attributes (type= ) Click if no changes are made (bypass “ submit query”) Click when finished
  • 115.
    • Return to your list; note that if you changed any attributes, the “changes” column indicates this.
    • Click the “Schema” tab
  • 116.
    • Best is to choose either “Relax NG schema (compact syntax) or “W3C schema”
    • Click “Documentation” to save documentation for this schema
    • Click “Sanity Checker” to make sure your schema validates
    • NOTE: You can return to the Roma tool to later edit your schema
  • 117. Conclusion
  • 118.
    • More and better documentation
    • More use (and support for use) by individuals
    • More discipline-specific customizations
    Future Trends in TEI
  • 119.
    • Historical Event Markup Language (HEML): http://www.heml.org/heml-cocoon/
    • Music Markup Language: http://www.musicmarkup.info/
    • Multi-Element Coding System: http://helmer.hit.uib.no/claus/mecs/mecs.htm
    Other Encoding Possibilities
  • 120.
    • WWP Guide to Scholarly Text Encoding: http://www.wwp.brown.edu/encoding/guide/index.html
    • TEI web site: http://www.tei-c.org/index.xml
    • The TEI listserv (TEI-L)
    • TEI Wiki: http://www.tei-c.org/wiki/index.php/Main_Page
    • Teach Yourself TEI: http://www.tei-c.org/Support/Learn/tutorials.xml
    • Guidelines for Text Encoding and Interchange: http://quod.lib.umich.edu/t/tei/
    • A Gentle Introduction to XML: http://www.tei-c.org/release/doc/tei-p4-doc/html/SG.html
    • A Companion to Digital Literary Studies: http://www.digitalhumanities.org/companion/DLS/
    References