UVA MDST 3073 Texts and Models-2012-09-11


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Text becomes reducible to its elementsBasic feature of the medium
  • (theoretically)
  • http://biblioklept.org/2012/01/31/list-of-rejections-of-wittgensteins-mistress-david-markson/
  • UVA MDST 3073 Texts and Models-2012-09-11

    1. 1. Lecture 4: Texts and Models Prof. Alvarado MDST 3703/7703 11 September 2012
    2. 2. Review• Posting “Hello, World!” – Put file in the public_html directory of your UVA Home Directory – Create a post and insert a link to this file – Categorize as: 09.06: (S) HTML• If you cannot get to your home directory, try uploading to http://homedir.virginia.edu
    3. 3. Some Quick Corrections• Digital text is not necessary – It’s an open question (i.e. do we have to have it?)• Nelson did not conceive of “trails,” Bush did• HTML is not the “first big idea” in the liberal arts; hypertext is (according to me)• The idea that “text shapes knowledge” is not ancient, but relatively new – Media determinism is a 20th century perspective – Although Plato notes the effects of literacy in the Phaedo• Not everything can be translated into HTML – i.e. HTML is not the richest framework for digital representation
    4. 4. Your Questions and Observations• Is commercialization killing creativity? – What is the relationship between how the web is organized economically and how it shapes expression?  EFFECT OF SOCIAL ORGANIZATION• What happens if the associations that someone makes is „off ‟ and illogical to others? – Does it loosen the way logical connections can be made and argued?  EFFECT ON LOGIC
    5. 5. Your Questions and Observations• Computers in general still heavily rely on a hierarchical structure – To what extent rationalization has occurred with the invention of hypertext?• Do things lose value and meaning in exchange for digital coding? – What is the effect of digitization on value?• Hypertexts and links online can be distracting – Non-linear thinking or mindless surfing?
    6. 6. Your Questions and Observations• People are trying to create the same exact classroom experience online that exists in the physical classroom, which is impossible – We need to rethink and restructure the online learning experience as a new and unique learning experience• How can we keep hypertext from altering us too much?• The beauty and the risk of an open source web
    7. 7. Practical Questions• How can an HTML webpage on your own computer be found by the search bar but not be on the web? – Your browser lives on your machine – The protocol name tells it where to look• I wondered if the picture from my computer would still show up if I opened the page from another computer?• It is interesting to see how one little thing out of place can ruin the entire code – Computers are stupid in that way• Why should coders learn HTML? – HTML is an interface language that can be easily generated from print statements in your code
    8. 8. What is HTML?• HTML is not a programming language – Programming languages express IF … THEN logic – But it is code that obeys a syntax & gets interpreted – And it is produced and consumed by programs• HTML is a very general interface language• HTML is written in XML, which we discuss today – Technically called “XHTML” – The original version was written in SGML
    9. 9. In general, don’t conflate HTML with hypertext or with digital representation in general
    10. 10. HTML is a language thatgenerates a species of hypertext which is, in turn, a species of digital representation
    11. 11. A provisional taxonomy
    12. 12. Is hypertext new?
    13. 13. [Study Bible]
    14. 14. 1 = Mishna, the first major transcription of the oral law 2 = Gemara, analytical discussions 3 = Rashi, glossary[Talmud] 4 = Tosefos, additions 5 = Hananel, comments 6 = Eye of Justice, legal decisions 8 = Light of the Bible, references to Biblical quotations. 9 = Bachs Annotations 10 = Gras Annotations
    15. 15. [Charrette]
    16. 16. [The Wasteland]
    17. 17. [Critical Edition]
    18. 18. [OED]
    19. 19. These are all examples of traditional textsThey exhibit “latent hypertext”
    20. 20. Landow• The concept of hypertext parallels poststructuralist views of text – Barthes, Foucault, Derrida, Kristeva, et al.• In this view, a text is not, and has never been, a bounded, closed thing – it is a network of signifiers that connect meanings across time and space …
    21. 21. Digital humanists have beenconcerned with encoding historical texts since at least 1949
    22. 22. Father Busa• Creator of the Index Thomisticus• Saw the computer as a solution to indexing the works of Aquinas in 1949 – 13,000,000 words – “in” took 4 years• Solution: – Lemmatization – Variations tagged as instances of a type
    23. 23. The complete works of Aquinas will be typed ontopunch cards; the machines will then work throughthe words and produce a systematic index of everyword St. Thomas used, together with the numberof times it appears, where it appears, and the sixwords immediately preceding and following eachappearance (to give the context). This will take themachines 8,125 hours; the same job would belikely to take one man a lifetime. Time Magazine, 1956, “Religion: Sacred: Electronics”
    24. 24. So, what is text?Let‟s look at some material examples
    25. 25. page o’ textReal world textcomes packaged indocuments
    26. 26. A document is amaterial artifactHow is textconveyed ina document?
    27. 27. What is text?
    28. 28. Visual Signifiers• Small caps• Indentation• Alignment• Italics• SpaceAll used to signify elements of text
    29. 29. Documents have thee Levels: Content, Structure, Style• Content – TEXT, images, video clips, etc.• Structure – The organization of content into units (elements) and logical relationships (e.g. reading order)• Style – Screen and print layout – Fonts, colors, etc.
    30. 30. Descriptive markup languages allowus to define structure of documents for computational purposes Theoretically, they do not specify layout or content
    31. 31. [PDF, Procedural Markup]In contrast to procedural markup like PDF
    32. 32. So, how are docs structured?
    33. 33. Hierarchically …(theoretically)
    34. 34. Document Elements and StructuresPlay – Heading – Act + • Return Address • Scene + • Date – Line + • Recipient Info – NameBook – Title – Chapter + – Address • Verse + – Content • Salutation • Paragraph + • ClosingLetter
    35. 35. These are all “trees”
    36. 36. XML is a markup language
    37. 37. What is XML?• Stands for eXtensible Markup Language – Actually invented after the web – A simplification of SGML, the language used to create HTML – It specifies a set of rules for creating specialized markup languages such as HTML and TEI• It is simplified version of the SGML – Standard Generalized Markup Language• SGML was invented in the early 1970s to wrest the control of documents from computer people who were taking over industries like law and accounting
    38. 38. XML looks like thisNotice how the element names reference units, not layout or style
    39. 39. Also markup for “in-line” elements
    40. 40. XML Premises1. All documents are comprised of elements.2. Elements contain content.3. Elements have no layout.4. Elements are hierarchically ordered.5. Elements are to be indicated by “markup” – tags that define the beginning and end of an element
    41. 41. XML Markup Rules• Tags signify structural elements• Three kinds of tag – Start and End, e.g <p> and </p> – Singleton, e.g <br />• Start and singleton tags can have attributes – Simple key/value pairs – <div class="stanza" style="color:red;">• Basic rules – All attributes must be quoted – All tags must nest (no overlaps!)
    42. 42. Documents in XML that meetthese rules are “well formed”
    43. 43. XML also provides Document Types• A Document Type Definition (DTD) defines a set of tags and rules for using them – Specifies elements, attributes, and possible combinations – E.g. in HTML, the ol and ul elements must contain li elements• A DTD is just one kind of schema system used by XML• Schema express data models of/for texts – TEI is a powerful way of describing primary source materials for scholars• Documents that use a schema properly are called “valid”
    44. 44. Originally, DTDs defined “genres”like business letter or mortgage formThey were later used to define more abstract models of textual content
    45. 45. XML is used everywhere• HTML – E.g. Embed codes• TEI (Text Encoding Initiative)• RSS• Civilization IV• Playlists (e.g. XSPF or “spiff ”)• Google Maps (KML)
    46. 46. A Look Again at HTML• aka XHTML – And now becoming HTML5• An instance of XML (formerly SGML)• An interface language• Language of the World Wide Web• Defined by a DTD that prescribes a specific set of elements and relations
    47. 47. HTML Document Structure• Head – Title – [Directives]• Body – H1+ – H2+ • P+ • UL – LI
    48. 48. Basic Elements with associated TagsElement Tags AttributesParagraph <p> ... </p>Numbered List <ol> <li> ... </li> </ol>Bulleted List <ul> <li> ... </li> </ul>Table <table> <tr> <td> ... </td> </tr> </table>Anchor <a> ... </a> href, targetImage <img/> src, borderObject <object> ... </object>
    49. 49. The Text Encoding Initiative createdTEI to mark up scholarly documents Mainly primary sources such as books and manuscripts
    50. 50. TEI• The dominant language used to encode scholarly text• The current room was the locations of UVa‟s EText Center – World famous for text encoding – Now part of the library and catalog• Scholars create their own schema to match what they are interested in
    51. 51. Examples• The TEI Header – http://tbe.kantl.be/TBE/examples/TBED02v00.ht m• TEI Prose – http://tbe.kantl.be/TBE/examples/TBED03v00.ht m• Find others at the TEI By Example Project – http://tbe.kantl.be/TBE/
    52. 52. XML contains an implicit theory of text What is it?
    53. 53. OCHO• XML (and therefore HTML and TEI) imply a certain theory of text – A text is an OHCO• OHCO – Ordered Hierarchy of Content Objects• An OHCO is a kind of tree – Elements follow each other in sequences – Elements can contain other elements
    54. 54. What are the advantages of this view?
    55. 55. OHCO allows for easy processing• Every element has a precise address in the text – E.g. HTML/body/p[1]• Texts can be described in the language of kinship – Ancestors, parents, siblings, children, etc.• Texts can be restructured and manipulated by known patterns and algorithms – Traversing – Pruning – Cross-referencing
    56. 56. What are the disadvantages of OCHO?
    57. 57. Logical vs. Physical Structure
    58. 58. Pages and ParagraphsTwo common structuresthat overlap
    59. 59. Solution 1: Split Elements<page n=“2”>...<p id=“foo”>His good looks and his rank had one fairclaim on his attachment, since to them he must have owed awife</p></page><page n=“3”><p id=“bar” prev_id=“foo”> a very superior character toanything deserved by his own.</p>...</page>
    60. 60. Solution 2: Use “Milestones”<p>His good looks and his rank had one fair claim onhis attachment, since to them he must have owed awife <pb n=“3” /> a very superior character toanything deserved by his own.</p> One structure gets backgrounded
    61. 61. Wittgenstein’s Manuscripts What about this?
    62. 62. [Charrette]
    63. 63. The problem of overlap suggeststhe need for a richer set of tools
    64. 64. What tools do McCarty and Unsworth reference?
    65. 65. Tables
    66. 66. A database for Ovid
    67. 67. McCarty• A different use of markup – From document description to interpretation – Creative “misuse”• Reverse engineering a “grammar” of personification from a markup strategy – Thickness = description (of text) – Depth = explanation (of text by reference to grammar)• Is forced to use tables in collaboration with markup
    68. 68. Thick description = Markup Deep description = Tables
    69. 69. How to reconcile these tools?
    70. 70. A Proposed Model• Texts are not documents – Documents are media, Texts are messages• Texts and documents are part of a system comprised of “levels” – They are effectively archaeology sites with stratigraphic layers – Erasures are like cities building on top of each other• Each level of the system is described by an appropriate set of tools – Document structures  XML – Textual structures, embedded ontologies  Tables
    71. 71. Basic Levels• Document – Physical objects (paper) – Logical objects (defined by space, style, punctuation, etc.) – Style and layout (also defined by space, color, etc.) – Can have superimposed versions• Text – Sequences of characters – Grammatical features – Figures and poetic features – Etc.